# MIPS Assembly language summary

### **MIPS** operands

| Name                   | Example                       | Comments                                                                    |  |  |  |  |
|------------------------|-------------------------------|-----------------------------------------------------------------------------|--|--|--|--|
|                        | \$s0-\$s7, \$t0-\$t9, \$zero, | Fast locations for data. In MIPS, data must be in registers to perform      |  |  |  |  |
| 32 registers           | \$a0-\$a3, \$v0-\$v1, \$gp,   | arithmetic. MIPS register \$zero always equals 0. Register \$at is          |  |  |  |  |
|                        | \$fp, \$sp, \$ra, \$at        | reserved for the assembler to handle large constants.                       |  |  |  |  |
|                        | Memory[0],                    | Accessed only by data transfer instructions. MIPS uses byte addresses, so   |  |  |  |  |
| 2 <sup>30</sup> memory | Memory[4],,                   | sequential words differ by 4. Memory holds data structures, such as arrays, |  |  |  |  |
|                        | Memory[4294967292]            | and spilled registers, such as those saved on procedure calls.              |  |  |  |  |

| ٨ | /IPS | assem | hlv | langu | age |
|---|------|-------|-----|-------|-----|
|   |      |       |     |       |     |

| Category      | Instruction                | Example              | Meaning                                     | Comments                          |
|---------------|----------------------------|----------------------|---------------------------------------------|-----------------------------------|
|               | add                        | add \$s1, \$s2, \$s3 | \$s1 = \$s2 + \$s3                          | Three operands; data in registers |
| Arithmetic    | subtract                   | sub \$s1, \$s2, \$s3 | \$s1 = \$s2 - \$s3                          | Three operands; data in registers |
|               | add immediate              | addi \$s1, \$s2, 100 | \$s1 = \$s2 + 100                           | Used to add constants             |
|               | load w ord                 | lw \$s1, 100(\$s2)   | \$s1 = Memory[\$s2 + 100]                   | Word from memory to register      |
|               | store w ord                | sw \$s1, 100(\$s2)   | Memory[ $$s2 + 100$ ] = $$s1$               | Word from register to memory      |
| Data transfer | load byte                  | lb \$s1, 100(\$s2)   | \$s1 = Memory[\$s2 + 100]                   | Byte from memory to register      |
|               | store byte                 | sb \$s1, 100(\$s2)   | Memory[\$s2 + 100] = \$s1                   | Byte from register to memory      |
|               | load upper<br>immediate    | lui \$s1, 100        | \$s1 = 100 * 2 <sup>16</sup>                | Loads constant in upper 16 bits   |
|               | branch on equal            | beq \$s1, \$s2, 25   | if (\$s1 == \$s2) go to<br>PC + 4 + 100     | Equal test; PC-relative branch    |
| Conditional   | branch on not equal        | bne \$s1, \$s2, 25   | if (\$s1 != \$s2) go to<br>PC + 4 + 100     | Not equal test; PC-relative       |
| branch        | set on less than           | slt \$s1, \$s2, \$s3 | if (\$s2 < \$s3) \$s1 = 1;<br>else \$s1 = 0 | Compare less than; for beq, bne   |
|               | set less than<br>immediate | slti \$s1, \$s2, 100 | if (\$s2 < 100) \$s1 = 1;<br>else \$s1 = 0  | Compare less than constant        |
|               | jump                       | j 2500               | go to 10000                                 | Jump to target address            |
| Uncondional   | jump register              | jr \$ra              | go to \$ra                                  | For switch, procedure return      |
| jump          | jump and link              | jal 2500             | \$ra = PC + 4; go to 10000                  | For procedure call                |



### 1) Description of the Fetch unit

Here we design the Fetch Unit of a pipelined MIPS CPU.



Fig. 1 - The Fetch Unit

The Fetch Unit is the part of the CPU that fetches the instruction from the Instruction Memory (IMem) into the Instruction Register (IR\_reg). It also handles jumps and branches. The Fetch Unit's main components are the PC register (PC\_reg) and IMem. The PC\_reg is a 32 bit register that advances by 4 in every clock. Thus we should have also a 32 bit Adder that adds 4 to the current PC\_reg value. In order to jump or branch, we need to input the jump address or branch address to the PC register. Thus, we have a multiplexer at the input of the PC register. This is depicted in Fig.1 above

#### a. Names & definition of signals inside the Fetch Unit

You must use these exact signal names in your design.

- PC\_reg a 32 bit register. When RESET is '1', the PC\_reg value becomes 0x400000.
   (All other registers and FFs will be cleared by RESET='1').
- 2. PC\_plus\_4 a 32 bit signal that has the PC\_reg\_value + 4.
- 3. PC\_plus\_4\_pID a registered version of the PC\_plus\_4 to be used in the ID phase. This is why we added \_pID at the end of that signal name.
- 4. branch\_adrs a 32 bit signal which is made of PC\_plus\_4\_plD + sext(imm)<<2. This is the address to be loaded into the PC when a successful branch is performed. Imm signal is made of the lower 16 bits of IR reg (see IR reg in #8 below).
- 5. jump\_adrs a 32 bit signal made of PC\_plus\_4\_plD[31:28] & IR[25:0] & b"00", i.e., the jump address in words multiplied by 4. This is the address to be loaded into the PC when a jump or a jal instruction is performed.
- 6. jr\_adrs a 32 bit signal made of the Rs value in a JR instruction. Since we do not have a GPR file, we set the Rs value to x"00400004". In the complete CPU this will be the address to be loaded into the PC when a jr (jump register) instruction is performed.
- 7. PC\_source a 2 bit signal. When "00", PC\_reg is loaded with PC\_plus\_4. When "01" it is loaded with branch\_adrs, when "10" with jr\_adrs (Rs value for jr instruction) and when "11", PC\_reg is loaded with the jump\_adrs.

  The PC\_source signal is created by a decoder looking at the opcode field of the instruction residing in the IR reg.
- 8. IR\_reg- a 32 bit register that has the instruction we read from the IMem. This register is part of the IMem (The IMem is an already designed component we use in the Fetch Unit).
- 9. imm the 16 LSBs of IR reg
- 10. sext\_imm sign extension of imm to 32 bits
- 11. opcode the 6 MSBs of IR\_reg. We sould determine the PC\_source value according to the instruction opcode (j,jal-11, beq,bne-01,jr -10, any other instruction-00).
- 12. HOLD This signal is meant to freeze all registers when it is "1". It will be used later for running the design in a single clock mode. At that mode this signal will be "1" all the time except for the clock cycles in which we want to perform a single clock "step". This means that all of the registers should have a HOLD input. The IMem itself and its output register (the IR) already support that signal.

### 1) The Rtype only MIPS CPU and its main components

We would like to design part of the MIPS CPU which is capable of running simple programs with Rtype instructions only. There are 3 main parts involved. These are the Fetch Unit from HW2, the GPR File and the MIPS ALU.

In this homework/lab exercise we will design the GPR File and the MIPS ALU. In the next exercise we will tie the GPR File, the MIPS ALU and the Fetch unit together to form an Rtype MIPS CPU.

Below we see a simplified drawing of the Rtype MIPS CPU.



Fig. 1 – The Rtype only MIPS CPU – a simplified drawing

### 2) GPR - design & simulation

The inside of the GPR File is made of a dual port memory. That memory does not have a register at its output as we had in the IMem we used in the Fetch Unit of HW2. Only the writing process of the memory is triggered by the rising edge of the clock. When "wr\_en"='1' and there is a rising edge in "wr\_clk", then the "wr\_data" is written into the "wr\_address" location of the memory. The reading from the dual port memory is a combinationl process.

Note that "wr\_address" and "rd\_address" are integers, and if your address signals is a STD\_LOGIC\_VECTOR signal you need to use the function conv\_integer( your signal vector name) to convert the STD\_LOGIC\_VECTOR value to integer in order to set a value to the wr address or rd address.

We give you a vhd file called **single\_port\_memory.vhd** and you need to manipulate it to become a **dual\_port\_memory.vhd**. The skeleton of the **dual\_port\_memory.vhd** is given in the **dual\_port\_memory.empty** file so that you will use the signal names we decided on.

The outside to the GPR File is described in the skeleton file **GPR.empty**. In this file we implement the following:

- 2.1) Although the dual port memory we use has address 0 and so we can write data into that address and read data from that address, we will make sure that when we read from read reg1=0, we will get rd data1=x"00000000".
- 2.2) Similarly, when reading from read reg2=0, we will get rd data2=x"00000000".
- 2.3) We will add a GPR\_hold input to the GPR file. When this input is '1' there should not be a write operation at the rising edge of the clock even if the RegWrite signal is '1'.

So you need to prepare the files:

- dual\_port\_memory.vhd that describes the dual port memory in which only the writing
  is synchronous (activated by the rising edge of the clock)
- **GPR.vhd** that "wraps" the dual\_port\_memory component of 32 addresses of 32 bits each and performs what was requested in 2.1 and 2.2 above

To ease the design for you the GPR.vhd content is depicted in Fig. 2 below.

Now you can run a simulation and check your design with the additional three files of:

- SIM\_GPR\_TB.vhd the TestBench file we prepared for you ahead of time
- SIM\_GPR\_TB\_data.dat the TestBench testing data file we prepared ahead of time
- SIM\_HW3\_GPR\_filenames.vhd In this file we specify the path of the data files used in simulation

With these 5 files you need to run the simulation and verify your design works fine. Note that you need to update the SIM\_HW3\_GPR\_filenames.vhd with the actual path of the SIM\_GPR\_TB\_data.dat file.



Fig. 2 – The inside of the GPR.vhd

You should submit a zip file of the entire GPR\_File simulation project – see detailed instructions at section 4 of this document.

Also you need to attach a doc file with screen captures describing the simulation you made. All i/o signals of GPR entity should be presented in the screen capture. Show the 2<sup>nd</sup> session of writing into the GPR File (clock cycles 46 to 55 = 920ns to 1100ns) and make sure that the values of all signals are readable. Explain what is seen in the rd\_data1 and rd\_data2 outputs of the GPR\_File in these clock cycles.

### 3) MIPS ALU – design & simulation

The MIPS ALU is a combinational circuit. No FFs are involved. We "added" to the ALU also the ALU\_src\_B multiplexer and also the logic control that issues the ALU\_cmd signal. The ALU\_cmd is a 3 bit signal vector which determines what is the calculation done by the ALU.

If the ALU\_cmd is "010", the ALU performs an addition. If ALU\_cmd is "110", the ALU performs a subtraction. Here is the list of operation done by the ALU according to the ALU cmd bits:

- ALU cmd="000" => A and B
- ALU\_cmd="001" => A or B
- ALU cmd="010" => A + B
- ALU cmd="011" => A xor B
- ALU\_cmd="100" => A nand B not used
- ALU cmd="101" => A nor B not used
- ALU cmd="110" => A B
- ALU\_cmd="111" => SLT. 1 if A<B, 0 if not. A & B are considered 2's complement numbers

The logic that drives the ALU\_cmd gets the 2 bit signal vector called ALUOP. When ALUOP="00", the ALU performs addition. When ALUOP="01", the ALU performs subtraction. When ALUOP="10", the ALU operation is determined by the 6 bit vector called Funct (function) that comes from the 6 LSBs of the IR reg. Here is the list of the Funct codes:

- Funct="100000" => ADD
- Funct="100010" => SUB
- Funct="100100" => AND
- Funct="100101" => OR
- Funct="100110" => XOR
- Funct="101010" => SLT

In all other cases we request to perform ADD.

The ALU\_src\_B mux selects what will be fed into the B input of the ALU. If ALUsrcB='0', we input the B\_in data into the ALU B input. If ALUsrcB='1', we input the sext\_imm data into the ALU B input.

We prepared a MIPS ALU.empty file for your convenience.

You need to add all of the logic described above. When done, you should run a simulation using the MIPS ALU.vhd file and the additional three files of:

- SIM MIPS ALU TB.vhd the TestBench file we prepared for you ahead of time
- SIM\_MIPS\_ALU\_TB\_data.dat the TestBench Data file we prepared ahead of time
- SIM\_HW3\_ALU\_filenames.vhd In this file we specify the path of the data files used in simulation

With these 4 files you need to run the simulation and verify your design works fine. Note that you need to update the SIM\_HW3\_ALU\_filenames.vhd with the actual path of the SIM\_MIPS\_ALU\_TB\_data.dat file.

You should submit a zip file of the entire MIPS\_ALU simulation project— see detailed instructions at section 4 of this document. Also you need to attach a doc file with screen captures describing

the entire simulation you made – till 1200 ns. All i/o signals of the MIPS\_ALU entity should be presented in the screen capture.

### 4) HW3 report

You should submit a single zip file for the Simulation of both entities. It should have three directories/folders. The first is called **GPR\_File**, the 2<sup>nd</sup> is called **MIPS\_ALU**, the 3<sup>rd</sup> is called **Disassembly**.

In the **GPR\_File** directory you will have the following 3 sub-directories:

- GPR\_File\_Src with all of your simulation sources
- GPR File Sim with the simulation project
- **GPR\_File\_Docs** Add a doc file with screen capture of the simulation showing the waveforms of the TB signal and the Console window. All i/o signals of GPR entity should be presented in the screen capture. Show the 2<sup>nd</sup> session of writing into the GPR File (clock cycles 46 to 55 = 920ns to 1100ns) and make sure that the values of all signals are readable. Explain in detail what do we see in rd\_data1 and rd\_data2 in these 10 clock cycles. The first few lines in the report will have your ID numbers (names are optional).

In the MIPS\_ALU directory you will have the following 3 sub-directories:

- MIPS ALU Src with all of your simulation sources
- MIPS ALU Sim with the simulation project
- MIPS\_ALU\_Docs Add a doc file with screen capture of the simulation showing the
  waveforms of the TB signal and the Console window. The screen captures should have
  the entire simulation you made (from its start to its end till 1200 ns), and all of the
  MIPS\_ALU i/o signals. No need to see the values of the signals, just the total picture and
  the console with a "Test Pass" message. The first few lines in the report will have your ID
  numbers (names are optional).

In the **Disassembly** directory you should have a doc file in which you disassemble a MIPS binary code and some explanations (answer questions).

See the questions in the file 18.1\_MIPS\_binary\_code\_for\_disassembly\_v4.docx

Note that the binary MIPS code you need to disassemble is the program we will be using in HW4. You need this disassembled code to understand what is done in HW3. That binary program to be disassembled appears in a Word file called 18.1\_MIPS\_binary\_code\_for\_disassembly\_v4.docx and also in a text file called 18.2 MIPS binary code for disassembly v4.txt.

Use this file and add your disassembled code. See the appendix at the end of the document for MIPS instructions coding. Also explain in detail what is done by this code. Also explain how this code tests the GPR\_file and ALU parts of a MIPS CPU.

At the end of this assignment you will have the necessary building blocks for our next assignment, HW4 – the "Rtype" MIPS CPU.

### **Enjoy the assignment !!**

### 5) Appendix A – MIPS instructions coding

a. Codes of the Opcode fields - IR(31 downto 26)

```
=[101011]=43
sw
lw
       =[100011]=35
lui
       =[001111]=15
ori
       =[001101]=13
addi
       =[001000]=8
       =[000100]=4
beq
bne
       =[000101]=5
       =[000010]=2
jal
       =[000011]=3
R-type =[000000]=0
```

b. Function field codes for RType instructions - IR(5 downto 0)

```
add =[100000]=32
sub =[100010]=34
and =[100100]=36
or =[100101]=37
xor =[100110]=38
slt =[101010]=42
jr =[001000]=8
```

Rs, Rt and Rd fields have a 5 bit binary number of the register (0-31)

### 1) The Rtype MIPS CPU and its main components

In HW3 we stated that we want to design part of the MIPS CPU which is capable of running simple programs with Rtype instructions only. There are 3 main parts involved. These are the Fetch Unit from HW2, the GPR File and the MIPS ALU. We built the last two components in HW3.

In this homework/lab exercise we are going to tie the GPR File, the MIPS ALU and the Fetch unit together to form an Rtype MIPS CPU.

Below we see a simplified drawing of the Rtype MIPS CPU we used in HW3.



Fig. 1 – The Rtype only MIPS CPU – a simplified drawing

In HW3 we called this CPU the Rtype only MIPS. However, in the Fetch Unit we already have the ability to support jump and branch instructions. Supporting **beq** and **bne** instructions might require some minor additions. In order to make things more interesting, we will also support the **addi** instruction. Thus, this "Rtype" MIPS CPU will start running from address 400000h and preform **Rtype** instructions and also **j**, **beq**, **bne** and **addi** instructions.

Some changes in the Fetch Unit are necessary to "tailor" it into the Rtype MIPS CPU. Our design of the Rtype MIPS CPU resides in the **HW4\_top.vhd**.

A more accurate description appears in Figure 2 below.



Fig. 2 - The Rtype MIPS or HW4\_MIPS CPU

### 2) HW4 Rtype MIPS CPU - design & simulation

The HW4 Rtype MIPS CPU will have four phases.

- **IF** Instruction Fetch, which is carried out inside the Fetch Unit producing the instruction in the IR\_reg at the rising edge of the clock which ends the IF phase and starts the ID phase.
- **ID** Instruction Decode, which is the stage in which we do the following:
  - Decode the instruction residing now at the IR\_reg and decide what should be done.
    - This means, we produce all control signals to be used by that instruction in all phases of this instruction ID, ED and WB.
  - Read Rs into A\_reg and Rt into B\_reg

The rising edge of the clock sampling data into the A\_reg and B\_reg ends the ID phase and starts the EX phase.

- EX Execute, which is the phase in which the ALU calculates the result of A op B (in Rtype instructions) or A+sext\_imm (in addi instructions). The result is sampled into the ALUout\_reg at the rising edge of the clock which ends the EX phase and starts the WB phase.
  - In this phase we also select Rs or Rd as the GPR file destination register to be written into in the Write Back phase.
- WB Write Back, which is the final phase of the instruction. If this is an Rtype or addi instruction, then we write the ALUout\_reg value into the GPR file. If this is a j, beq or bne instruction, we do nothing at that stage. The rising edge of the clock sampling data into the GPR File ends the WB phase and completes the instruction.

As explained above, the control signals are created by decoding the instruction residing in the IR\_reg at the ID phase. If the control signal is supposed to influence at the EX phase, it must be delayed by 1 clock cycle. If that control signal is supposed to influence at the WB phase, it must be delayed by 2 clock cycles. You will have to handle these timing issues in order to make your design function properly.



Fig. 2b - The Rtype MIPS control scheme

#### a. Modifications required in the Fetch Unit

We do the following changes in the Fetch\_Unit entity so it will be possible to use it in the HW4 TOPdesign. See Figure 3 on the next page.

We remove all rdbk0-15 output signals from the Fetch Unit. We hope we won't need them since the Fetch Unit is already debugged and the changes we introduce are minor.

Instead we add output signals coming out of the Fetch\_Unit that should be used by the rest of the CPU. These <u>output</u> signals are:

- 1. IR\_reg\_pID This is a 32 bit signal of the IR\_reg (the instruction bits). We added pID to that signal name to indicate it is the IR reg value at the ID phase.
- 2. sext\_imm\_pID Similarly, this is the 32 bit sext\_imm signal we calculate at the ID phase. It is outputted from the Fetch Unit to be used later in the EX phase.
- 3. PC\_reg\_plF this is the 32 bit PC\_reg we use during the IF phase for the Instruction Fetch, i.e., for reading from the IMem. It is outputted from the Fetch Unit to be used for verification purposes only (debugging).

These signals are used in the **HW4\_top** entity. They also allow us testing the IR\_reg and sext\_imm (and the PC\_reg) during simulation. Our Fetch\_Unit stays the same for simulation & implementation – no changes are required when going from the simulation phase to the implementation phase. Note that for TB purposes we output the CK\_out\_to\_TB, RESET\_out\_to\_TB, HOLD\_out\_to\_TB signals from the **HW4\_top\_4sim.vhd** which in HW4 is our top component. Therefore, when going from simulation to implementation, we will need to change the **HW4 top** and remove these signals.

Now we add an input signal to the updated Fetch Unit.

1. We add the Rs\_equals\_Rt\_pID signal that tells us whether to branch in beq (if it is '1') or not (if it is '0'). This signal should come from comparing the two data outputs of the GPR File which resides outside the Fetch\_Unit. You should modify the PC\_source signal decoder so that the beq and bne instructions are properly performed. Make sure that theh addi instruction is also supported.

The rest of the Fetch\_Unit signals are left unchanged. See Fig. 3 below for the updated Fetch\_Unit with the new signals in RED

When simulating our top file is <code>HW4\_top\_4sim.vhd</code>. In this entity we will use the <code>BYOC\_Host\_Intf\_4sim.vhd</code> as our Host Interface circuit having the pre-loaded IMem. For implementation our top vhd file will be renamed to <code>HW4\_top.vhd</code> and inside it, we will use the <code>BYOC\_Host\_Intf.ngc</code> file. The difference between the two <code>Host\_Intf</code> versions is that in the sim version the Host Interface has the program already loaded inside (actually it is loaded at the beginning of the simulation). The implementation version includes the real <code>Host\_Intf</code> mechanism allowing us to load a program from the PC, run the design in single clock mode and see the readback signals. The difference between the <code>HW4\_top\_4sim.vhd</code> and the <code>HW4\_top.vhd</code> will be minor - removal of TB signals.



Fig. 3 – The updated Fetch\_Unit (new signals – in red)

Note that in the <code>HW4\_top\_4sim.empty</code> file we already connected all of the components (Fetch\_Unit, GPR, MIPA\_ALU, BYOC\_Host\_Intf – those are the blue, orange, green and pink parts of Fig. 2). We also defined all of the HW4\_top signals (see in section <code>b</code> below). Your job is therefore to rename it to <code>HW4\_top\_4sim.vhd</code> and build the missing "logic" in the <code>HW4\_top\_4sim.vhd</code> (which is the yellow part in Fug. 2). That "logic" is made of the registers, FFs and combinational logic forming the Rtype MIPS CPU.

### 1. The HW5 MIPS CPU and its main components

In this assignment we add Iw and sw instructions to the Rtype MIPS CPU we designed in HW4. This means we have to add the Data Memory (DMem) to our design. Following this we will have an almost complete MIPS CPU capable of performing Rtype, addi, j, beq, bne, Iw and sw instructions. In our next & final assignment we will complete the CPU by adding jal, jr, lui and ori instructions and add forwarding to enhance the CPU performance.

The DMen we add is located inside the BYOC\_Host\_Intf component that includes infrastructure allowing loading data into the IMem and DMem memories.

Below we see a simplified drawing of the HW5 MIPS CPU we are going to build in this assignment.



Fig. 1 - The HW5 MIPS CPU

A more accurate drawing includes the BYOC\_Host\_Intf part – as depicted in Fig.2 below:



Fig. 2 – The HW5\_MIPS CPU with the Host\_Intf infrastructure

To your **HW5\_top** design, the only required connections to the **BYOC\_Host\_Intf** are the RESET & HOLD signals, the IMem connections and the DMem connections and the rdbk0-15 coming from the signals we want to check during implementation. These and the rest of the **BYOC\_Host\_Intf** connections are already given in the **HW5\_top\_4sim.empty** file.

You should rename that file to **HW5\_top\_4sim.vhd** and add the necessary equations – all based on HW4 design). Below we describe the actual work required.

#### a. Connecting the DMem

The DMem signals are already connected in the **HW5\_top\_4sim.empty** file. There is no need to add any more DMem connections, but you need to understand these connections:

- 1. MIPS\_DMem\_adrs a 32 bit address signal of DMem is connected to ALUout\_reg signal.
- MIPS\_DMem\_rd\_data the 32 bit data read from the DMem (we read from the address specified by MIPS\_DMem\_adrs). This is after a register. I.e., it is actually the MDR data. It is directly connected to the HW5\_top signal called MDR\_reg.
- 3. MIPS\_DMem\_wr\_data 32 bit data to be written into the DMem to the address specified by MIPS\_DMem\_adrs at the rising edge of the CK if MIPS\_DMem\_we is '1'. It is connected to (i.e., driven by) the B\_reg\_pMEM signal of HW5\_top.
- 4. MIPS\_DMem\_we a '1' means data will be written into the DMem at the rising edge of the CK. This is driven by the MemWrite pMEM signal of HW5 top.

#### b. Names & definition of signals inside the HW5\_top MIPS CPU

In your design, you should use the exact signal names as were used in the Rtype MIPS CPU of HW4 and <u>add</u> the following signals using the exact signal names shown below:

#### ID additional signals

- 5. MemWrite '1' when this is a sw instruction and we write into the DMem, '0' otherwise.
- 6. MemToReg '1' when we read from memory, i.e., in lw instruction.

#### EX phase signals

- 7. MemWrite pEX MemWrite delayed by 1 clock cycle.
- 8. MemToReg pEX MemToReg delayed by 1 clock cycle.

#### MEM phase signals.

- 9. B\_reg\_pMEM a 32 bit register receiving the B\_reg signal (i.e., B\_reg delayed by 1 CK cycle). This register has the data to be written into the DMem in sw instruction.
- 10. Rd\_pMEM the output of RegDest mux selecting to which register the CPU writes in the WB phase.
- 11. MemWrite\_pMEM MemWrite\_pEX delayed by 1 clock cycle.
- 12. MemToReg\_pMEM MemToReg\_pEX delayed by 1 clock cycle.
- 13. RegWrite\_pMEM RegWrite\_pEX delayed by 1 clock cycle.

#### WB phase signals

- 14. MDR\_reg a 32 bit register that has the data read from the memory. This is a rename of the DMem\_rd\_data signal coming out of the **BYOC\_Host\_Intf\_4sim** component.
- 15. ALUout\_reg\_pWB a 32 bit register that has the ALUour\_reg data delayed by 1 CK cycle.
- 16. GPR\_wr\_data a 32 bit signal that is the output of the MemToReg mux (selecting between MDR reg and ALUout reg pWB).
- 17. Rd\_pWB Rd\_pMEM delayed by 1 clock cycle.
- 18. MemToReg\_pWB MemToReg\_pMEM delayed by 1 clock cycle
- 19. RegWrite\_pWB RegWrite\_pMEM delayed by 1 clock cycle.



Fig. 1 - HW4 control scheme



Fig. 2 – HW5 control scheme

(Additions to HW4 control signals - in blue)

### 1. The HW6 MIPS CPU

In this assignment we add jal, jr, lui, ori instructions to the MIPS CPU we designed in HW5. Thus, we will have a CPU supporting Rtype (add, sub, and, or, xor, slt), addi, lui, ori, beq, bne, lw ,sw, j, jal and jr instructions. Besides adding these instructions we would like to add a forwarding mechanism to enhance the CPU performance.

It is highly recommended to watch the lecture in: <a href="http://youtu.be/Yu6FFVhI4D4">http://youtu.be/Yu6FFVhI4D4</a> and the first 11 minutes of: <a href="http://youtu.be/-fylybz8p">http://youtu.be/-fylybz8p</a> M

Below we remind you of the HW5 MIPS CPU we designed in HW5. It is almost the same as the HW6 MIPS of this assignment



Fig. 1 - The HW5 MIPS CPU

#### a. HW6 outline

This assignment has 3 parts. The first is to add the new instructions and it is described in section **b** below. It is recommended to fill up the table in **Appendix A** before starting to add the new instructions. We will not implement that design, just run the simulation. After a successful simulation of this part you should add the forwarding mechanism. This is done in two parts, data forwarding (which is described in section **c**) and branch forwarding (described in section **d**). Thus this HW has 3 parts: i) Add the new instructions ii) Add data forwarding iii) Add branch forwarding.

#### b. PART I - Adding the new instructions

- i. LUI The simplest way to add the lui insruction is to change the sign extension circuit so that when we have a lui instruction, it shifts the imm left for 16 times. We should make sure that the rest of the circuit will behave in a similar manner to addi instruction. For example, the ALU will add it's a input value to the sext\_imm\_reg value that appears in its B input. Thus, we should make sure that the A\_reg value is 0. This can be done in several ways. The simplest way is to make sure that the Assembler always translate lui instruction so that Rs=0. Another way is to force Rs to be 0 (b"00000") when we decode a lui instruction.
- ii. ORI This instruction is almost the same as addi one. There are two differences. The first is that in ori instruction we should prevent sign extension of the imm. This is easily done by an additional change in the sign extension circuit. The other difference is forcing the ALU to perform a OR operation instead of an ADD one. The simplest way to do that is to use the 4<sup>th</sup> combination of the ALUOP vector signal. While b"00" means ADD, b"01" means SUB and b"10" means use the FUNCTION field to determine the ALU operation, we will add the combination b"11" and will change the MIPS\_ALU so that ALUOP="b11" will result with an OR operation.
  - Thus for supporting ORI, we should fix the sext\_imm circuit, force ALUOP control signal to be b"11" and change the MIPS\_ALU to support this combination.
  - [The expected behavior of the ALUOP signal is: "10" in Rtype instructions, "01" in beq & bne, "11" in ori, "00" in all other instructions]
- iii. JR Supporting this instruction is pretty easy. We should direct the Rs content value (GPR\_rd\_data1) back into the Fetch\_Unit so that the jr\_adrs signal inside the Fetch Unit will get the GPR\_rd\_data1 instead of the constant x"00400004" we had so far. This means we need to add a input signal to the Fetch\_Unit entity. This new 32 bit input signal is called jr\_adrs\_in.
- iv. JAL Supporting this instruction is a little more involved. The jal should behave exactly as the j instruction in the Fetch Unit so that when a j instruction or jal instruction appear in the IR\_reg, the PC\_source will be "11" and the PC\_reg will get the "jump\_adrs" signal at its input. This makes sure that we jump properly in both cases. In jal we should also write the PC\_plus\_4 of the instruction to \$ra, i.e., to register \$31 in the GPR File. How do we do that? We "propagate" the PC\_plus\_4 value till the WB phase and there, add it as an additional input to the MemToReg mux. We need to output the PC\_plus\_4\_plD from the Fetch\_Unit (this means a change in the i/o pins of the Fetch\_Unit). This signal needs to "propagate" till it becomes be PC\_plus\_4\_reg\_pWB. We need to make sure we issue RegWrite='1' in jal and we should force "Rd" to be 31. Since the rule for RegDst mux is

that RegDst='1' only in Rtype instructions, it means that in jal instruction it is '0' and the RegDst mux choose Rd\_pMEM to be Rt\_pEX, it means that in jal instruction we should force Rt to be 31 (b"11111").

To summarize, we need support jal in the Fetch Unit the same as we do for j instruction, we need to output PC\_plus\_4\_plD from the Fetch\_Unit and delay it till the WB phase, we need to issue a RegWrite='1', we need to expand the MemToReg mux to write the PC\_plus\_4 in the WB phase of jal instruction and we need to force Rt to be 31 in jal instruction.

See more in section **e** below.

#### c. PART II - Data forwarding

In a pipelined implementation of a CPU we encounter an inherent latency problem. The result of an add instruction (we will use add instruction as an example, but the analysis is applicable also for all instructions writing back into the GPR File except lw and jal, i.e., Rtype, addi, lui and ori instructions) is available for a later instruction that uses it only after the WB phase of the add instruction is completed. The instruction using that result "reads" it from the GPR File in its' ID phase. Thus, we need to wait 3 time slots before "using" the add result in a new instruction. This is depicted in Fig. 2 below. The updated value of \$3 is written into the GPR File in the rising edge of the clock ending the WB phase of the "add \$3,\$5,\$8" instruction (marked by the red line).

Thus, the ID phase of the "add \$y,\$3,\$x" instruction which uses that value, can occur to the right of the red line. We see that the inherent 5 CKs latency of the pipelined implementation results with "wasted" time slots.



Fig. 2 – The pipelined MIPS latency

We can use these time slots for other instructions that do not write to \$3 or \$x (those who are used by the "add \$y,\$3,\$x" instruction). A smart C compiler can therefore improve the situation. However, it is easy to overcome this problem and improve the situation dramatically by "Data Forwarding".

Data Forwarding means using the updated value to be written into the GPR File even before it is written into the GPR File. This is possible since that data already exists inside the pipeline – in most cases. We read data from the GPR File in the ID phase of an instruction in order to use it in the EX phase of the instruction. This means that the forwarding should occur in the EX phase or before it, in the ID phase of the instruction we want to forward the data to.

We have 3 cases of Data Forwarding.

- 1. Case I: Forward data from previous instruction in the EX phase of the current instruction if the Rs or Rt of the current instruction is written into by the previous instruction.
  - I.e., if RegWrite\_pMEM='1' and Rd\_pMEM=Rs\_pEX, we should use ALUout\_reg value instead of A reg value.
  - Similarly, if RegWrite\_pMEM='1' and Rd\_pMEM=Rt\_pEX, we should use ALUout\_reg value instead of B\_reg value.
  - This is described by the arrow from the MEM phase of the 1<sup>st</sup> instruction (the top one) in Fig. 3, to the EX phase of the 2<sup>nd</sup> instruction.
- 2. Case II: Forward data from the instruction that was done 2 clocks ago in the EX phase of the current instruction if the Rs or Rt of the current instruction is written into by the instruction from 2 clocks ago.
  - I.e., .if RegWrite\_pWB='1' and Rd\_pWB=Rs\_pEX, we should use MemToReg mux output value instead of A reg value.
  - Similarly, if RegWrite\_pWB='1' and Rd\_pWB=Rt\_pEX, we should use MemToReg mux output value instead of B\_reg value.
  - This is described by the arrow from the WB phase of the 1<sup>st</sup> instruction in Fig. 3, to the EX phase of the 3<sup>rd</sup> instruction.
- 3. Case III: Forward data from the instruction that was done 3 clocks ago. This is done in the ID phase of the current instruction (through a "transparent GPR") if the Rs or Rt of the current instruction is written into by the instruction from 3 clocks ago.
  - This means that inside the GPR, if rd\_reg1=wr\_reg and Reg\_Write='1', then we should bypass the GPR file and output the wr\_data instead of the "regular" rd\_data1. Similarly to rd reg2 and rd data2.

This is described by the arrow from the MEM phase of the 1<sup>st</sup> instruction in Fig. 3, to the ID phase of the 4<sup>th</sup> instruction.



Fig. 3 – Data Forwarding timing diagram (from the 1<sup>st</sup> instruction to future instructions)



Fig. 3B – The 3 Data Forwarding options to an instruction (to the 4<sup>th</sup> instruction from previous instructions)

Fig. 3B shows we see that the 1<sup>st</sup> instruction writes to register \$3, the 2<sup>nd</sup> instruction writes to register \$2 and the 3<sup>rd</sup> instruction writes to register \$3. We see the 3 forwarding mechanisms working to supply updated data to the 4<sup>th</sup> instruction. In the ID phase of the 4<sup>th</sup> instruction we read the result of the 1<sup>st</sup> instruction via the "transparent GPR" mechanism supporting forwarding from 3 instructions ago. In the EX phase of the 4<sup>th</sup> instructions we see forwarding of Rs from the previous instruction (in red) and from 2 instructions ago (in magenta).

In Fig. 4 and Fig. 5 below we see the MIPS data path without and with Data Forwarding. The changes are drawn in red. The connections shown in the MIPS data path in Fig. 5 support forwarding from previous instruction (case I) and from instruction before the previous one (case II). The forwarding through "transparent" GPR File (case III) is not shown in Fig. 5. It is described in Fig. 6 further below with the changes inside the GPR File also drawn in red.



Fig. 4 – MIPS data path (part) with no forwarding



Fig. 5 – MIPS Data Path with Data Forwarding



Fig. 6 – MIPS Data Path with Data Forwarding

The only two signals we added to the HW6 MIPS CPU to support Data Forwarding are A\_reg\_wt\_fwd and B\_reg\_wt\_fwd that are the outputs of two new muxes at the A and B inputs of the ALU. Actually we also need to keep the value of Rs till the EX phase so that we can use to check whether forwarding data to the A input of the ALU is required. Thus, we also added the RS pEX register.

#### Important notes:

- 1) No forwarding should be done if we read from register \$0 [see how it is handled in Fig. 6].
- 2) We need to make sure that we handle the situation properly also in cases where we have 2 or 3 previous instructions writing to the same register we are reading from in the current instruction.
- 3) We need to make sure that we use the correct data also in sw instruction.
- 4) This forwarding does not apply to lw instruction (if it is the previous instruction) since a lw instruction has valid write data only at the WB phase after the MEM phase, while other instructions such as Rtype, addi, ori & lui that write to the GPR File have their valid data after the EX phase from MEM phase and on.
- 5) Similarly, jal data path is different than the regular instructions and data is available for forwarding only at the WB of the jal instruction.

After adding the data forwarding muxes, we should use A\_reg\_wt\_fwd instead of A\_reg wherever the A\_reg data was used and similarly, use B\_reg\_wt\_fwd instead of B\_reg wherever the B\_reg data was used.

See more in section e below

#### d. PART III - Branch forwarding

In a similar manner, the pipeline inherent latency also creates problems when we perform a branch instruction. In order to decide whether to branch or not, we compared the data values read from both outputs of the GPR file. This means we have to wait until the data inside the GPR file is updated before we can branch. As in the data case, we would like to build a forwarding mechanism allowing us to compare the right values as soon they are available.



Fig. 7 – Branch Forwarding timing diagram

In Fig. 7 we see that an add instruction writes to register \$3. If we want to compare \$3 in our branch instruction, we must wait at least 1 time slot. This is so since the result of the add instruction is only available after the EX phase, i.e., from MEM phase and on. Since the branch comparison is done it its' ID phase, the branch instruction' ID phase cannot be performed before the MEM phase of the add instruction. This is not enough. We need a Branch Forwarding mechanism that will bring the updated MEM phase value to the Rs\_equals\_Rt comparator. Usually this comparator compares GPR\_rd\_data1 to GPR\_rd\_data2. Only when we compare a register that was written into (actually, will be written into) by the instruction before the previous (2 instructions ago) we need to forward the MEM phase data (which is the ALUOUT\_reg data). Note that we do not need to handle Branch Forwarding from earlier instructions since from 3 instructions ago, the "transparent GPR" of the data forwarding does that for us, and from 4 instructions ago there is no forwarding problem since the GPR File is updated on time.

You should add this mechanism as depicted in Fig. 8 below.



Fig. 8 – MIPS Data Path with Data and Branch Forwarding

The only two signals we added to the HW6 MIPS CPU to support Branch Forwarding are GPR\_rd\_data1\_wt\_fwd and GPR\_rd\_data2\_wt\_fwd that are the outputs of two new muxes at the inputs of the Rs\_equals\_Rt comparator.

Note that this mechanism should also be used by the jr instruction. In the jr instruction we have a similar latency issue. Instead of sending back the GPR\_rd\_data1 to the Fetch Unit, you need now to use the GPR\_rd\_data1\_wt\_fwd vector signal.

See more in section e below.

# $\underline{Appendix \ A} - IMem \ program \ for \ simulation - 2^{nd} \ part$

| Address          | label | Inst. | Rd/Rt | Rs  | Rt  | Imm/label | # remark                  | Inst. code |
|------------------|-------|-------|-------|-----|-----|-----------|---------------------------|------------|
| 4001A8           | cont: | addi  | \$1   | \$0 |     | 1000h     | # prep4 sw/lw test        | 20011000   |
| 4001AC           | conti | nop   | 7-    | γo  |     | 100011    | " preprowitive test       | 0000000    |
| 4001B0           |       | addi  | \$2   | \$0 |     | 5555h     |                           | 20025555   |
| 4001B4           |       | addi  | \$3   | \$0 |     | AAAAh     |                           | 2003AAAA   |
| 4001B4<br>4001B8 |       | add   | \$1   | \$1 | \$1 | 700001    | # 1 add once              | 00210820   |
| 4001B8<br>4001BC |       | nop   | ŞΙ    | ŞΙ  | ŞΙ  |           | #1 add office             | 00000000   |
| 4001BC<br>4001C0 |       | nop   |       |     |     |           |                           | 0000000    |
| 4001C0<br>4001C4 |       | nop   |       |     |     |           |                           | 00000000   |
| 4001C4<br>4001C8 |       | add   | \$1   | \$1 | \$1 |           | # 2 add for the 2nd time  | 00210820   |
| 4001CS<br>4001CC |       | nop   | ŲΙ    | 71  | ŢΙ  |           | # 2 ddd for the zha thine | 00000000   |
| 4001CC           |       | nop   |       |     |     |           |                           | 00000000   |
| 4001D0           |       | nop   |       |     |     |           |                           | 00000000   |
| 4001D4<br>4001D8 |       | add   | \$1   | \$1 | \$1 |           | # 3                       | 00210820   |
| 4001DC           |       | nop   | Ψ-    | Υ-  | ¥-  |           | 0                         | 00000000   |
| 4001E0           |       | nop   |       |     |     |           |                           | 00000000   |
| 4001E4           |       | nop   |       |     |     |           |                           | 00000000   |
| 4001E8           |       | add   | \$1   | \$1 | \$1 |           | # 4                       | 00210820   |
| 4001EC           |       | nop   | -     | -   | *-  |           |                           | 00000000   |
| 4001F0           |       | nop   |       |     |     |           |                           | 00000000   |
| 4001F4           |       | nop   |       |     |     |           |                           | 00000000   |
| 4001F8           |       | add   | \$1   | \$1 | \$1 |           | #5                        | 00210820   |
| 4001FC           |       | nop   |       |     |     |           |                           | 00000000   |
| 400200           |       | nop   |       |     |     |           |                           | 00000000   |
| 400204           |       | nop   |       |     |     |           |                           | 00000000   |
| 400208           |       | add   | \$1   | \$1 | \$1 |           | # 6                       | 00210820   |
| 40020C           |       | nop   |       |     |     |           |                           | 00000000   |
| 400210           |       | nop   |       |     |     |           |                           | 00000000   |
| 400214           |       | nop   |       |     |     |           |                           | 00000000   |
| 400218           |       | add   | \$1   | \$1 | \$1 |           | #7                        | 00210820   |
| 40021C           |       | nop   |       |     |     |           |                           | 00000000   |
| 400220           |       | nop   |       |     |     |           |                           | 00000000   |
| 400224           |       | nop   |       |     |     |           |                           | 00000000   |
| 400228           |       | add   | \$1   | \$1 | \$1 |           | #8                        | 00210820   |
| 40022C           |       | nop   |       |     |     |           |                           | 00000000   |
| 400230           |       | nop   |       |     |     |           |                           | 00000000   |
| 400234           |       | nop   |       |     |     |           |                           | 00000000   |
| 400238           |       | add   | \$1   | \$1 | \$1 |           | # 9                       | 00210820   |
| 40023C           |       | nop   |       |     |     |           |                           | 00000000   |
| 400240           |       | nop   |       |     |     |           |                           | 00000000   |

|        |      |     |     |     |   | 1                        | ı        |
|--------|------|-----|-----|-----|---|--------------------------|----------|
| 400244 | nop  |     |     |     |   |                          | 00000000 |
| 400248 | add  | \$1 | \$1 | \$1 |   | # 10                     | 00210820 |
| 40024C | nop  |     |     |     |   |                          | 00000000 |
| 400250 | nop  |     |     |     |   |                          | 00000000 |
| 400254 | nop  |     |     |     |   |                          | 00000000 |
| 400258 | add  | \$1 | \$1 | \$1 |   | # 11                     | 00210820 |
| 40025C | nop  |     |     |     |   |                          | 00000000 |
| 400260 | nop  |     |     |     |   |                          | 00000000 |
| 400264 | nop  |     |     |     |   |                          | 00000000 |
| 400268 | add  | \$1 | \$1 | \$1 |   | # 12                     | 00210820 |
| 40026C | nop  |     |     |     |   |                          | 00000000 |
| 400270 | nop  |     |     |     |   |                          | 00000000 |
| 400274 | nop  |     |     |     |   |                          | 00000000 |
| 400278 | add  | \$1 | \$1 | \$1 |   | # 13                     | 00210820 |
| 40027C | nop  |     |     |     |   |                          | 00000000 |
| 400280 | nop  |     |     |     |   |                          | 00000000 |
| 400284 | nop  |     |     |     |   |                          | 00000000 |
| 400288 | add  | \$1 | \$1 | \$1 |   | # 14                     | 00210820 |
| 40028C | nop  |     |     |     |   |                          | 00000000 |
| 400290 | nop  |     |     |     |   |                          | 00000000 |
| 400294 | nop  |     |     |     |   |                          | 00000000 |
| 400298 | add  | \$1 | \$1 | \$1 |   | # 15                     | 00210820 |
| 40029C | nop  |     |     |     |   |                          | 00000000 |
| 4002A0 | nop  |     |     |     |   |                          | 00000000 |
| 4002A4 | nop  |     |     |     |   |                          | 00000000 |
| 4002A8 | add  | \$1 | \$1 | \$1 |   | # 16 - the 16th addition | 00210820 |
| 4002AC | nop  |     |     |     |   |                          | 00000000 |
| 4002B0 | nop  |     |     |     |   |                          | 00000000 |
| 4002B4 | nop  |     |     |     |   |                          | 00000000 |
| 4002B8 | sw   | \$2 | \$1 |     | 0 | # now \$1=???            | AC220000 |
| 4002BC | SW   | \$3 | \$1 |     | 4 |                          | AC230004 |
| 4002C0 | lw   | \$4 | \$1 |     | 0 |                          | 8C240000 |
| 4002C4 | lw   | \$5 | \$1 |     | 4 |                          | 8C250004 |
| 4002C8 | nop  |     |     |     |   |                          | 00000000 |
| 4002CC | nop  |     |     |     |   |                          | 00000000 |
| 4002D0 | nop  |     |     |     |   |                          | 00000000 |
| 4002D4 | add  | \$5 | \$5 | \$4 |   |                          | 00A42820 |
| 4002D8 | nop  |     |     |     |   |                          | 00000000 |
| 4002DC | nop  |     |     |     |   |                          | 00000000 |
| 4002E0 | nop  |     |     |     |   |                          | 00000000 |
| 4002E4 | addi | \$5 | \$5 |     | 1 |                          | 20A50001 |
| 4002E8 | nop  |     |     |     |   |                          | 00000000 |
| 4002EC | nop  |     |     |     |   |                          | 00000000 |

| I I    |         |     |     |     |        |                |          |
|--------|---------|-----|-----|-----|--------|----------------|----------|
| 4002F0 |         | nop |     |     |        |                | 00000000 |
| 4002F4 |         | bne | \$5 | \$0 | errlp  |                | 14A00007 |
| 4002F8 |         | nop |     |     |        |                | 00000000 |
| 4002FC |         | nop |     |     |        |                | 00000000 |
| 400300 |         | nop |     |     |        |                | 00000000 |
| 400304 | endlp:  | j   |     |     | endlp  |                | 081000C1 |
| 400308 |         | nop |     |     |        |                | 00000000 |
| 40030C |         | nop |     |     |        |                | 00000000 |
| 400310 |         | nop |     |     |        |                | 00000000 |
| 400314 | errlp:  | j   |     |     | errlp  |                | 081000C5 |
| 400318 |         | nop |     |     |        |                | 00000000 |
| 40031C |         | nop |     |     |        |                | 00000000 |
| 400320 |         | nop |     |     |        | end of errlop  | 00000000 |
| 400324 | endlp2: | j   |     |     | endlp2 |                | 081000C9 |
| 400328 |         | nop |     |     |        |                | 00000000 |
| 40032C |         | nop |     |     |        |                | 00000000 |
| 400330 |         | nop |     |     |        | end of program | 00000000 |

## <u>Appendix B</u> – Rect4 - IMem program for implementation

| Address |         |             | Rd/ | Rs/ |     | lmm/    |        | MIPS     |
|---------|---------|-------------|-----|-----|-----|---------|--------|----------|
| in Hex  | label   | instruction | Rt  | Rt  | Rt  | label   | remark | Hex ode  |
| 400000  | main    | addi        | \$1 | \$0 |     | 64      |        | 20010040 |
| 400004  |         | addi        | \$2 | \$0 |     | 2000h   |        | 20022000 |
| 400008  |         | addi        | \$4 | \$0 |     | 16      |        | 20040010 |
| 40000C  |         | nop         |     |     |     |         |        | 00000000 |
| 400010  |         | nop         |     |     |     |         |        | 00000000 |
| 400014  | shft_lp | add         | \$2 | \$2 | \$2 |         |        | 00421020 |
| 400018  |         | addi        | \$4 | \$4 |     | -1      |        | 2084FFFF |
| 40001C  |         | nop         |     |     |     |         |        | 00000000 |
| 400020  |         | nop         |     |     |     |         |        | 00000000 |
| 400024  |         | nop         |     |     |     |         |        | 00000000 |
| 400028  |         | bne         | \$4 | \$0 |     | shft_lp |        | 1480FFFA |
| 40002C  |         | nop         |     |     |     |         |        | 00000000 |
| 400030  |         | addi        | \$2 | \$2 |     | 18h     |        | 20420018 |
| 400034  |         | addi        | \$3 | \$0 |     | -1      |        | 2003FFFF |
| 400038  |         | nop         |     |     |     |         |        | 00000000 |
| 40003C  |         | nop         |     |     |     |         |        | 00000000 |
| 400040  |         | nop         |     |     |     |         |        | 00000000 |
| 400044  | drawlp  | SW          | \$3 | \$2 |     | 0       |        | AC430000 |
| 400048  |         | addi        | \$1 | \$1 |     | -1      |        | 2021FFFF |
| 40004C  |         | addi        | \$2 | \$2 |     | 52      |        | 20420034 |
| 400050  |         | nop         |     |     |     |         |        | 00000000 |
| 400054  |         | nop         |     |     |     |         |        | 00000000 |
| 400058  |         | bne         | \$1 | \$0 |     | drawlp  |        | 1420FFFA |
| 40005C  |         | nop         |     |     |     |         |        | 00000000 |
| 400060  | end     | j           |     |     |     | end     |        | 08100018 |
| 400064  |         | nop         |     |     |     |         |        | 00000000 |

### **Simulation report**

3.1) The listed below signals should be presented in the screen capture you need to attach to your report. Show clock cycles 196-224 (following the end of the reset pulse, find i=196-224) and make the values of all signals readable. For this you will probably need to show clocks 196-210 and 210-224 separately. These are the signals that can help you in "testing" the DMem.



3.2) Explain in detail what happens, i.e., what do we see here. Note that it is essential to the success of your future design that you will verify that the design does what we wanted it to do in these CK cycles.

We store data using SW inside of a loop using branching to fill the memory

3.3) What is the latency of an R-type instruction? That is: How many nop-s should be inserted between two consecutive R-type instructions if the 2nd one uses the result of the 1st one?

There should be 3 nop-s between two consecutive R-type instructions. i.e., the latency of a R-type instruction is 3 CK cycles.

3.4) Explain the limitation of beq that tests a register that is calculated by Rtype instruction. As an example, translate the following C if statement: for (i=0;i<10;i++) { ... }

where i resides in register \$3.

It takes 3 clocks to the data in rtype to be written in the memory so we need to make sure there is at least 3 clocks between the rtype instruction and the beq instruction

sw \$3 0 sw \$4 10 loop:

addi \$3 \$3 1

no op no op

beq \$3 \$4 loop

3.5) Are there any other limitations due to the pipeline structure in the instructions we implemented (Rtype, addi, beq, bne, lw, sw, j)? How can we overcome these limitations (e.g., by adding nop-s)? Try to list all of the **SW** & **HW** based solutions you can think of.

We need to make sure that there is enough clock cycles between memory write/read commands so that there will be no conflict while reading/writing to a specific register

### Implementation report

1) What is the value of register \$2 after 122 cks?

The address of the first line: 20000018.

2) What happens after 126 CKs?

The beginning of printing a rectangle on the screen, i.e. the first line is printing on the screen.

3) What happens when you press the RUN button?

A rectangle has been drawn on the screen which connected to the board.

4) Explain the **HW5\_rect4** program (what is the job of every register used. What is done in each loop, etc.)

The main aim of the program is to draw a rectangle to the screen. It do that by:

Register \$1 saves the number of lines of the rectangle that will be drawn.

Register \$2 represents the address in the screen of the pixel that we want to change.

Register \$3 is the value that allows you to paint specific pixels (each bit marked with 1 turns on the screen).

Register \$4 is the loop counter for register \$2.

label shft\_lp: Increasing Register \$2 to 20 million.

label drawlp: write to screen the rectangle.

label end: end program. An endless loop.

5) How long does it take [in seconds] to draw a 32x32 white square when we use the draw loop of the **HW5** rect4 program?

It takes 7 clock cycles to draw a single line, so drawing 32 lines will take us 7x31 = 217 clock cycles which are 8,680 nanoseconds.

6) Can you shorten the loop? If you can, write the code and explain.

The loop can be shortened by canceling the designated counter and using it in comparison to the final value expected from the register \$2, i.e. 20003294 (52X 63).

the code:

drawlp: sw \$3, 0(\$2) # write to screen

addi \$2,\$2,52 # increment ptr (\$2)

nop

nop

bne \$2,20003294, drawlp # if \$2 not end addrs goto drawlp

7) Can you think of a faster way to draw the square in the same short loop? If you can, write the code and explain.

A faster way to perform the painting in the draw loop is to define another register that will point to the next row (after the initial value of register \$2) and to increase by 104 instead of 52 of the two registers.

The code:

| drawlp:sw \$3, 0(\$2)          | # | write to screen           |
|--------------------------------|---|---------------------------|
| sw \$3, 0(\$5)                 |   |                           |
| addi \$1,\$1,-2                | # | decrement counter         |
| addi \$2,\$2, <mark>104</mark> | # | increment ptr (\$2)       |
| addi \$5,\$5, <mark>104</mark> | # | increment ptr (\$7)       |
| nop                            |   |                           |
| nop                            |   |                           |
| bne \$1,\$0, drawlp            | # | if cntr not 0 goto drawlp |

A.1) Fill up the following table describing what happens in each CK cycle in all instructions. You should specify the specific operations that are required for the execution of the instruction.

We filled in the Rtype and j instructions – as examples. We also gave the list of required registers & signals to be mentioned in the table, in the ori instruction line.

| phase       | IF                       | ID                                                                          | EX                                                                                  | MEM                                              | WB                          |
|-------------|--------------------------|-----------------------------------------------------------------------------|-------------------------------------------------------------------------------------|--------------------------------------------------|-----------------------------|
| Instruction |                          |                                                                             |                                                                                     |                                                  |                             |
| Rtype       | IR=IMem[PC]<br>PC= PC+4  | A=GPR[Rs]<br>B=GPR[Rt]                                                      | ALUOUT = A op B                                                                     | ALUOUT_pWB=<br>ALUOUT<br>(ALUOUT is delayed 1ck) | GPR[Rd_pWB]<br>= ALUOUT_pWB |
|             |                          | Active signals:<br>RegDst='1'<br>RegWrite='1'<br>ALUOP="10"<br>MemToReg='0' | Rd is chosen:<br>Rd_pMEM=Rd_pEX                                                     |                                                  |                             |
| addi        | IR=IMem[PC]<br>PC= PC+4  | A=GPR[Rs] B=imm  ALUsrcB <= '1' RegWrite <= '1'                             | ALUOUT = A + B                                                                      | ALUOUT_pWB=<br>ALUOUT                            | GPR[Rd_pWB]<br>= ALUOUT_pWB |
| ori         | IR=IMem[PC]<br>PC= PC+4. | A=GPR[Rs] B=imm  ALUsrcB <= '1' ALUOP <= b"11" RegWrite <= '1'              | ALUOUT = A or B  All regs that are relevant (ALUOUT, B_reg_pMEM, Rd_pMEM, sext_imm) | ALUOUT_pWB=<br>ALUOUT                            | GPR[Rd_pWB]<br>= ALUOUT_pWB |
| lui         | IR=IMem[PC]<br>PC= PC+4  | A=imm  B=GPR[Rt]  ALUsrcB <= '1'  RegWrite <= '1'                           | ALUOUT = imm<<16                                                                    | ALUOUT_pWB=<br>ALUOUT                            | GPR[Rd_pWB]<br>= ALUOUT_pWB |
| beq         | IR=IMem[PC]<br>PC= PC+4  | PC= branch_adrs<br>ALUOP <= b"01"                                           | nothing                                                                             | nothing                                          | nothing                     |
| bne         | IR=IMem[PC]<br>PC= PC+4  | PC= branch_adrs<br>ALUOP <= b"01"                                           | nothing                                                                             | nothing                                          | nothing                     |

|     | IR=IMem[PC]<br>PC= PC+4 | ALUsrcB <= '1'<br>MemToReg <= '1' | Rt and Rd registes delayed | MDR= DMem[adrs ]      | GPR[Rd_pWB]<br>= ALUOUT_pWB                     |
|-----|-------------------------|-----------------------------------|----------------------------|-----------------------|-------------------------------------------------|
| lw  | 1021014                 | RegWrite <= '1'                   | By 1 ck                    |                       | = /\Loco1_p\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ |
|     |                         |                                   |                            |                       |                                                 |
|     |                         |                                   |                            |                       |                                                 |
|     | IR=IMem[PC]             | ALUsrcB <= '1'                    | Rt and Rd registes         | DMem[adrs]=B_reg_pMEM | nothing                                         |
| sw  | PC= PC+4                | MemWrite <= '1'                   | delayed<br>By 1 ck         |                       |                                                 |
|     |                         |                                   | By I CK                    |                       |                                                 |
|     |                         |                                   |                            |                       |                                                 |
|     |                         |                                   |                            |                       |                                                 |
|     | ID IMamiDO              | DC image adea                     | n athin n                  | m athin m             | n adh in n                                      |
| j   | IR=IMem[PC]<br>PC=PC+4  | PC= jump adrs                     | nothing                    | nothing               | nothing                                         |
|     |                         |                                   |                            |                       |                                                 |
|     |                         |                                   |                            |                       |                                                 |
|     | IR=IMem[PC]             | PC= jump adrs                     |                            |                       |                                                 |
| jal | PC= PC+4                | RegWrite <= '1'<br>JAL <= '1'     | nothing                    | nothing               | nothing                                         |
|     |                         | 5/1E \= 1                         |                            |                       |                                                 |
|     |                         |                                   |                            |                       |                                                 |
|     |                         |                                   |                            |                       |                                                 |
| •   | IR=IMem[PC]<br>PC= PC+4 | PC= jr_adrs                       | nothing                    | nothing               | nothing                                         |
| jr  |                         |                                   |                            |                       |                                                 |
|     |                         |                                   |                            |                       |                                                 |
|     |                         |                                   |                            |                       |                                                 |
|     |                         |                                   |                            |                       |                                                 |

Answer the following questions.

A.2) Describe the changes done in order to support the ORI instruction.

. ביטול מריחת סימן

```
process (imm, opcode)
begin
    if opcode = 15 then -- lui
        sext_imm <= imm & x"0000"; --@@@HW6
    elsif opcode /= 13 then -- @@@HW6 not ori
        if imm(15) = '0' then
            sext_imm <= x"0000" & imm;
        elsif imm(15) = '1' then
            sext_imm <= x"FFFF" & imm;
        end if;
else
        sext_imm <= x"0000" & imm;
end if;
end process;</pre>
```

:regwrite •

```
if Opcode = 0 or Opcode = 8 or Opcode = 13 or Opcode = 15 or Opcode = 3 or Opcode = 35 then --rtype or addi or ori or lui or jal or lw
   RegWrite <= '1';
else
   RegWrite <= '0';
end if:</pre>
```

aluscrB ו-aluop ובחירת מקור האיבר השני לALU:

שינויים בקובץ הalu לתמיכה בפקודה:

A.3) Describe the changes done in order to support the LUI instruction.

sext imm שינוי ערך

```
process (imm, opcode)
begin

if opcode = 15 then -- lui

sext_imm & x"0000"; --@@@HW6
```

.rsח איפוס ה

```
if Opcode = 15 then --lui
    Rs <= b"00000";
else
    Rs <= IR_reg(25 downto 21);
end if;</pre>
```

```
if Opcode = 0 or Opcode = 8 or Opcode = 13 or Opcode = 15 or Opcode = 3 or Opcode = 35 then --rtype or addi or ori or lui or jal or lw RegWrite <= '1';
else
RegWrite <= '0';
end if;
```

alusrcB

```
process (Opcode, IR_reg)
begin
  if Opcode = 8 or Opcode = 13 or Opcode = 15 or Opcode = 35 or Opcode = 43 then --addi or ori or lui or lw or sw
    ALUsrcB <= '1';
else
    ALUsrcB <= '0';
end if;</pre>
```

A.4) Describe the changes done in order to support the JR instruction.

. jr\_adrs\_in בשם fetch unith יצרנו משתנה חדש ככניסה

```
-- JR address (create the jr_adrs signal) jr_adrs <= jr_adrs_in; --@@@HW6
```

בpr\_read\_data1 משתנה זה מקבל את הערך שלו מtop2

```
----@@HW6 add JR support -- HW6 adding JR forwarding means a change here jr_address <= GPR_rd_data1_wt_fwd;
```

A.5) Describe the changes done in order to support the JAL instruction.

ec register מקבל את הכתובת הרלוונטית לקפיצה. ●

```
process (RESET, CK, opcode)
begin
    if RESET='1' then
        PC_reg <= x"00400000";
elsif CK'event and CK='1' and HOLD ='0' then
        if opcode = 2 or opcode = 3 then
            PC_reg <= jump_adrs; --@@@HW6
        else
            PC_reg <= PC_mux_out;
        end if;
end process;</pre>
```

- שרשרנו" אותו ע"ב רגיסטרים לשלב הBc +4 את fetch unit ו"שרשרנו" אותו ע"ב רגיסטרים לשלב ה-wB. ∙
  - : לערך "rt אילוץ" והדלקתו ו "אילוץ" •

```
if Opcode = 3 then --jal
    JAL <= '1';
    Rt <= b"11111";
else
    JAL <= '0';
    Rt <= IR_reg(20 downto 16);
end if;</pre>
```

regwrite

```
if Opcode = 0 or Opcode = 8 or Opcode = 13 or Opcode = 15 or Opcode = 3 or Opcode = 35 then --rtype or addi or ori or lui or jal or lw RegWrite <= '1';
else
RegWrite <= '0';
end if;
```

שינוי ה mux כך שיקבל את ה- pc + 4 המעודכן:

```
process (MemToReg_pWB, MDR_reg, ALUout_reg_pWB, PC_plus_4_pWB, JAL_pWB)
begin
    if JAL_pWB = '1' then
        GPR_wr_data <= PC_plus_4_pWB;
    elsif MemToReg_pWB = '1' then
        GPR_wr_data <= MDR_reg;
    else
        GPR_wr_data <= ALUout_reg_pWB;
    end if;
end process;</pre>
```

In your answers, besides stating the reasoning in detail, show the relevant VHDL code sections to better explain your answers.

#### Answer the following questions.

- B.1) What are the limitations due to the pipeline latency of the following combinations:
- Iw after add where the add Rd is the Iw Rs
- Iw after add where the add Rd is the Iw Rt
- add after lw where the lw Rt is the add Rt
- · beq after lw where the lw Rt is the beq Rs

Use a similar figure to Fig.2 and Fig. 3 to demonstrate your answers. Explain your answer!

B.1.a - Iw after add where the add Rd is the Iw Rs



B.1.b - Iw after add where the add Rd is the Iw Rt





B.1.d - beq after lw where the lw Rt is the beq Rs



B.2) What are the limitations of all cases of B.1 after you add he Data Forwarding? .

<u>Explain your answer!</u>

#### B.2.a - Iw after add where the add Rd is the Iw Rs



. lw של id מקבל תוצאה בשלב ex מקבל תוצאה בשלב ה add – הסבר

#### B.2.b - Iw after add where the add Rd is the Iw Rt



B1.b הסבר – לא נדרש שום דבר בדיוק כמו בסעיף



add של id לשלב ה forwarding ויבצע MEM מקבל תוצאה בשלב ה lw הסבר:

B.2.d - beq after lw where the lw Rt is the beq Rs



לכן יבוצע ID מקבל מקבל אותה שלב שלב השem ואריך אותה בשלב וw מקבל וw הסבר: NOP לאחר לאחר אחר אחר ויבוצע

B.3) How many times do we perform the instruction following a jal instruction? Explain in detail. What are the implications? If this is a problem, what do you suggest in order to solve it?

ההוראה לא מתבצעת מכיוון שבעת קבלת פקודה ה PCמקודם ב4 אוטומטית ולכן כשנשמור ברגיסטר \$PA משלב PC את כתובת החזרה היא תיהיה PC+4 כלומר הפקודה השניה אחרי JAL, הפתרון הוא קידום הPC+4 משלב ה WB לשלב ה WB

B.4) How soon after jal instruction can we issue a jr \$31 instruction in order to return to the right location in the code? Give the answer before data forwarding is added and then after the data forwarding is added. . <u>Explain your answer!</u>



נדרש שה PC+4 ישמר ב GPRוזה יכול לקרות רק אחרי של ה PC+

With data forwarding:



מעביר מידע לשלב הEX מעביר מידע לשלב מחזורי השעון שיש להמתין DATA FORWARDING מעביר מידע לשלב הביצוע שיש להמתין עד ביצוע 31\$ iP אשר מתבצע בשלב ה

Answer the following questions.

- C.1) What are the limitations due to the pipeline latency of the following combinations (assume Data Forwarding already exists):
  - beq after add where the add Rd is the beq Rt
  - beg after lw where the lw Rt is the beg Rs

Use a similar figure to Fig.2 and Fig. 3 to demonstrate your answers. Explain your answers!

C.1.a - beq after add where the add Rd is the beq Rt



ללא branch forwarding ניתן לקבל מידע עדכני לשלב הID ניתן לקבל מידע branch forwarding (transparent GPR)

C.1.b - beq after lw where the lw Rt is the beq Rs



ניתן לקבל מידע עדכני לשלב הID ניתן לקבל מידע עדכני לשרב ניתן לקבל ניתן לקבל מידע (transparent GPR)

# C.2) What are the limitations of all cases of C.1 after you add the Branch Forwarding? . Explain your answers!

C.2.a - beq after add where the add Rd is the beq Rt



C.2.b - beq after lw where the lw Rt is the beq Rs



ניתן לקדם מידע ב LWרק לאחר שנשלף מהזכרון

C.3) Why can't we check the result of the previous instruction (time slot n-1) by a beq instruction following it (time slot n)?

מכיוון שבביצוע beq אנחנו מבצעים את ההשוואה בשלב הID ולכן יש לחכות NOP אחד לפחות כדי שהפקודה הקודמת תסיים את של הEX על מנת להשתמש במידע העדכני שלה

C.4) List all of the limitations for Assembly programmer you can think of that still exist after adding the Data & Branch Forwarding circuits. . <u>Explain your answer!</u>

אחד לפקודת LW יש לחכות לפחות NOP אחד לפקודת LW אחרי כל פקודת BRANCH דורש לפחות NOP אחד לפניו

) What is the shortest loop code possible (not an infinite loop)? Any limitations? 5C Explain in detail

Lw \$8 0h Lw \$9 (loop length) Nop (if first command in loop is r type Loop: Loop operations Addi \$8 \$8 4 nop Bne \$8 \$9 Loop

## .VHDL

```
process (list of clock related inputs)
                        begin
                                all commands are here
                                if comes with end if; and elsif if needed
                         end process;
                           CK-
                                Always starts with process
                                 The sensitivity list = list of clock related inputs
process (CK) •

    Always starts with begin

begin →
                                        If we have a rise in the CK then we
    if CK'event and CK='1' then
                                          sample data to Y.
           Y <= A;
                                          The else is implied, if not stay unchanged
    end if;
                                           Y <= Y;
end process;
                                                    Ends with end process
                                                                      <u>סיגנלים – signals.</u>
                                           .signal NAME: std logic; ביט בודד
            .signal NAME: std_logic_vector (MSB down to 0) • מספר ביטים: •
                                                                         <u>ישות – entity.</u>
                       ישות מתארת את היציאות והכניסות של מכשיר או רכיב שרוצים להגדיר.
                                               entity NAME is
                                               Port (list of ins and outs);
                                               end Name;
  entity mux_8x2to1 is
 Port (
                                                              mux_8x2to1
   in0
       : in STD_LOGIC_VECTOR (7 downto 0);
                                                                              out_y
   in1 : in STD_LOGIC_VECTOR (7 downto 0);
      : in STD_LOGIC;
   sel
          : out STD_LOGIC_VECTOR (7 downto 0)
   out y
  end mux_8x2to1;
```

# .MIPS' Assembler

#### רגיסטרים.

| שימוש                          | מספר  | שם          |
|--------------------------------|-------|-------------|
| 0 ערך קבוע                     | 0     | \$zero      |
| ערכים עבור תוצאות והערכת ביטוי | 2-3   | \$v0 - \$v1 |
| ארגומנטים                      | 4-7   | \$a0 - \$a3 |
| זמניים                         | 8-15  | \$t3 - \$t7 |
| נשמרים                         | 16-23 | \$s0 - \$s7 |
| עוד זמניים                     | 24-25 | \$t8 - \$t9 |
| Global pointer                 | 28    | \$gp        |
| Stack pointer                  | 29    | \$sp        |
| Frame pointer                  | 30    | \$fp        |
| Return address                 | 31    | \$ra        |

#### פקודות מסוג R-Type.

#### מכילה 32 ביטים, כאשר:

| 31 26    | 25 21 | 20 16 | 15 11 | 10 6  | 5 0      |
|----------|-------|-------|-------|-------|----------|
| Opp Code | $R_s$ | $R_t$ | $R_d$ | 00000 | Function |

#### פקודות מסוג I-type.

מכילה 32 ביטים, כאשר:



#### פקודות מסוג J-type.



## <u>שלבי ביצוע של המכונה.</u>

- .1 Fetch קריאת ההוראות מהזיכרון באמצעות ה-PC כפוינטר.
- 2. Decode פיענוח ההוראות (החלטה מה לעשות בצעד הבא) וקריאת הרגיסטרים ההכרחיים (1 או 2).
  - .ALU חישוב התוצאה או כתובת הזיכרון באמצעות ה-Execute
  - 4. Memory שימוש בתוצאת ה-ALU כדי לגשת לזיכרון אם צריך (קריאת נתונים, כתיבת נתונים).
    - לתוך הרגיסטר המתאים (עדכון רגיסטרים). Write back 5.



Codes of the Opcode fields – IR (31 downto 26):

| Command | Binary Opcode | Decimal Value |
|---------|---------------|---------------|
| SW      | 101011        | 43            |
| lw      | 100011        | 35            |
| lui     | 001111        | 15            |
| ori     | 001101        | 13            |
| addi    | 001000        | 8             |
| beq     | 000100        | 4             |
| bne     | 000101        | 5             |
| j       | 000010        | 2             |
| jal     | 000011        | 3             |
| R-type  | 000000        | 0             |

Functions field codes for R-type instructions – IR (5 down 0):

| Command | Binary Opcode | Decimal Value |
|---------|---------------|---------------|
| add     | 100000        | 32            |
| sub     | 100010        | 34            |
| and     | 100100        | 36            |
| or      | 100101        | 37            |
| xor     | 100110        | 38            |
| slt     | 101010        | 42            |
| jr      | 001000        | 8             |

# <u>תקציר פקודות.</u>

| הערות                                                   | משמעות                                    | דוגמה                    | פקודה | קטגוריה       |
|---------------------------------------------------------|-------------------------------------------|--------------------------|-------|---------------|
| המידע ברגיסטרים                                         | \$S1 = \$S2 + \$S3                        | add \$S1, \$S2, \$S3     | add   |               |
| המידע ברגיסטרים                                         | \$\$1 = \$\$2 - \$\$3                     | sub \$S1, \$S2, \$S3     | sub   | אריתמטיות     |
| הוספת קבוע                                              | \$S1 = \$S2 + 100                         | addi \$s1, \$s2,<br>100  | addi  |               |
| מילה מהזיכרון<br>לרגיסטר                                | \$\$1 = Memory[\$\$2<br>+ 100]            | lw \$s1, 100(\$s2)       | lw    |               |
| מילה מרגיסטר<br>לזיכרון                                 | Memory[\$S2 + 100]<br>= \$S1              | sw \$s1, 100(\$s2)       | SW    |               |
| בייט מהזיכרון<br>לרגיסטר                                | \$S1 = Memory[\$S2<br>+ 100]              | lb \$s1, 100(\$s2)       | lb    | העברת<br>מידע |
| בייט מרגיסטר<br>לזיכרון                                 | Memory[\$S2 + 100]<br>= \$S1              | sb \$s1, 100(\$s2)       | sb    |               |
| טעינת קבוע לתוך<br>16 ביטים עליונים                     | \$\$1 = 100 * 2 <sup>16</sup>             | lui \$s1, 100            | lui   |               |
| טעינת קבוע לתוך<br>16 ביטים תחתונים                     | \$12 = 0x0000BEEF                         | ori \$12, \$0,<br>0xBEEF | ori   |               |
| Branch on equal                                         | if (\$S1 == \$S2)<br>go to PC +4 +<br>100 | beq \$s1, \$s2, 25       | beq   |               |
| Branch on not<br>equal                                  | if (\$S1 != \$S2)<br>go to PC +4 +<br>100 | bne \$s1, \$s2, 25       | bne   | קפיצה         |
| Set on less than                                        | if (\$S2 < \$S3)<br>\$S1 = 1;             | slt \$s1, \$s2, \$s3     | slt   | מותנית        |
|                                                         | else \$S1 = 0;                            |                          |       |               |
| Set on less than immediate                              | if (\$S2 < 100)<br>\$S1 = 1;              | slti \$s1, \$s2,<br>100  | slti  |               |
|                                                         | else \$S1 = 0;                            |                          |       |               |
| קפיצה לכתובת                                            | go to 10000                               | j 2500                   | j     |               |
| קפיצה לרגיסטר                                           | go to \$ra                                | jr \$ra                  | Jr    | קפיצה         |
| קפיצה לכתובת<br>וחזרה לכתובת<br>שאחרי הפקודה<br>המקורית | \$ra = PC + 4 ; go<br>to 10000            | jal 2500                 | jal   |               |

## <u>דגשים חשובים.</u>

- . כל 32 הרגיסטרים בעלי 32 ביטים של המעבד: GPR general-purpose registers
- בין כל שתי פקודות R-type צריך 2 משר משנים את אותו רגיסטר. אפשר לפתור את זה op 2 צריך באמצעות data forwarding.
  - Data forwarding מעבירים מידע לשלב ה-EX של 1-3 של 2-1 צעדים אחורה. מתי צריך את זה? כאשר פונים לרגיסטר שכתבתנו אליו לפני 1-3 צעדים קודם.
    - .ID-ם מעבירים מידע לשלב Branch forwarding •

```
1 -----
  2 --
  4 -- This module is the HW6 top entity for simulation see --@@@HW6 for HW6 related changes

7 -- It supports Rtype instructions of: add, sub, and, or, xor, slt
8 -- It also supports addi, beq, bne, lw & sw instructions
9 -- It also supports lui, ori, jr & jal instructions

 10 --
 11 -- There are 5 phases in HW6 MIPS CPU: IF, ID, EX, MEM, WB
 12 --
 13 --
 14 -
 15 library IEEE;
16 use IEEE.STD_LOGIC_1164.ALL;
 17 use IEEE.STD_LOGIC_UNSIGNED.ALL;
 18 use IEEE.STD_LOGIC_ARITH.ALL;
 19
   22
 23 entity HW6_top is
 24 Port (
25 --- Infrastructure signals [To be used by PC via RS232 or from Nexys2 board switches & pushbuttons and VGA signals to the screen]
   -- Host intf signals
               : in STD_LOGIC;
: out STD_LOGIC;
 27 RS232 Rx
 28 RS232_Tx
 29 -- VGA signals
 30 VGA h sync
                    : out
                                   STD_LOGIC:
                                   STD_LOGIC;
 31 VGA_v_sync
 32 VGA_red0
                                 STD_LOGIC;
                                 STD_LOGIC:
 33 VGA red1
                        out
 34 VGA_red2
                                 STD_LOGIC;
                        out
 35 VGA_grn0
                                 STD_LOGIC;
                        out
 36 VGA grn1
                        out
                                 STD LOGIC;
 37 VGA_grn2
                        out
                                 STD_LOGIC;
 38 VGA_blu1
                                 STD_LOGIC;
                        out
 39 VGA blu2
                        out
                                 STD LOGIC;
 40 --Flash Mem signals
                                   STD_LOGIC; -- '0' when accessing MOBILE SDRAM mem
STD_LOGIC_VECTOR (23 downto 1); -- Flash read/write address
 41 MT_ce_n
                          out
 42 Flash_adrs
                           out
                  : out
: out
: out
                                   STD_LOGIC; -- '0' when accessing Flash mem
STD_LOGIC; -- '0' when writing to Flash mem
STD_LOGIC; -- '0' when reding from Flash mem
 43 Flash_ce_n
 44 Flash_we_n
 45 Flash_oe_n
                                 STD_LOGIC; -- '0' when reseting Flash mem
STD_LOGIC; -- '1' when Flash mem FSM is done
STD_LOGIC_VECTOR (15 downto 0); -- Date read from Imem or Dmem to be written to Flash mem or data read from Flash mem to be written to I
 46 Flash_rp_n
 47 Flash sts
                  : in
: inout
 48 Flash_data
 49 -- KBD signals
                                STD_LOGIC; -- PS2 keyboard clock
STD_LOGIC; -- PS2 keyboard data
 50 PS2C
                        in
 51 PS2D
                       in
 52 --general signals
                                   \textbf{STD\_LOGIC\_VECTOR} \hspace{0.2cm} \textbf{(7 downto 0);} -- \hspace{0.2cm} \textbf{7=Flash\_stts, 6=MIPS\_ck, 5-0=Host\_intf version} \\
 53 leds out
                       out
                  : in
 54 CK_50MHz
                                  STD_LOGIC;
 55 buttons_in
                         in
                                   STD_LOGIC_vector(3 downto 0);-- btn0 is single clock (manual clock), btn3 is manual reset
                  : in
                                 STD_LOGIC_VECTOR (7 downto 0);-- 4-0 to select which part to be displayed on the 7Segnets LEDs STD_LOGIC_VECTOR (6 downto 0);-- to the 7 seg LEDs
 56 switches_in
57 sevenseg_out
 58 anodes_out
                         out
                                    STD_LOGIC_VECTOR (3 downto 0) -- to the 7 seg LEDs
 59
 61 end HW6_top;
 62
 64 architecture Behavioral of HW6_top is
 65
   68
 70 constant MIPS_data_width : INTEGER :=32; -- data width in bits
71 constant MIPS_adrs_width : INTEGER :=32; -- Full address width of MIPS CPU
 72
 73
 75 -- Put here all the components used: Clock_Driver, BYOC_Host_intf, your components
 76 -- ========
 79 COMPONENT Clock_Driver is
 80 port
 81
                       : in std_logic;
 82
      CK 50MHz IN
 83
      CK_25MHz_OUT
                        : out std_logic
 84
 85 END COMPONENT:
 87
 90 COMPONENT BYOC_Host_intf is
 91 Port (
 93 -- MIPS signals [to be used by students]
94 MIPS_reset : out STD_LOGIC
95 MIPS hold : out STD_LOGIC
                                     STD LOGIC; -- output to the Student's design
 95 MIPS_hold
                          out
                                   STD_LOGIC; -- output to the Student's design
 96 -- MIPS IMem signals
                           in
98 MIPS_IMem_rd_data : 99 -- MTDC_Data
 97 MIPS_IMem_adrs
                                     STD_LOGIC_VECTOR (31 downto 0); -- MIPS IMem read address
                        out
                                   STD_LOGIC_VECTOR (31 downto 0); -- read data (sync read - at the rising edge of MIPS_ck, all the time)
99 -- MIPS DMem signals
100 MIPS_DMem_we :
                                      STD_LOGIC; -- '1' when the CPU writes to MIPD_DMem (MIPS_Dmem_wr_data is written to MIPS_DMem_adrs at the rising edge of MIPS_ck),
102 MIPS_DMem_wr_data :
                                    STD_LOGIC_VECTOR (31 downto 0);-- MIPS DMem read/write address
STD_LOGIC_VECTOR (31 downto 0);-- write data (sync write - at the rising edge of MIPS_ck, if MIPS_DMem_we='1')
                             in
                             in
                                        STD_LOGIC_VECTOR (31 downto 0);-- read data (sync read - at the rising edge of MIPS_ck, all the time)
                             out
104 -
```

```
106 --Flash Mem signals
107 Flash adrs
                                   out
                                                 STD_LOGIC_VECTOR (23 downto 1); -- Flash read/write address
                                             STD_LOGIC;-- '1' when accessing Flash mem
STD_LOGIC;-- '1' when writing to Flash mem
STD_LOGIC;-- '1' when reding from Flash mem
108 Flash ce n
                                   out
109 Flash_we_n
                                   out
110 Flash_oe_n
                                   out
111 Flash rp n
                                              STD LOGIC; -- '0' when reseting Flash mem
                                   out
112 Flash_sts
                                                          '1' when Flash mem FSM is done
                                  in
113 Flash rd data
                                                 STD_LOGIC_VECTOR (15 downto 0); -- data read from Flash mem to be written to Imem or Dmem
                                                  STD LOGIC VECTOR (15 downto 0); -- Date read from Imem or Dmem to be written to Flash
114 Flash wr data
                                    out
116 -- Infrastructure signals [To be used by PC via RS232 or from Nexys2 board switches & pushbuttons, and VGA signals to the screen],
117 -- Host intf signals
                 : in STD_LOGIC;
118 RS232_Rx
119 RS232_Tx
120 -- VGA signals
                       : out STD_LOGIC;
121 VGA_h_sync
                                              STD_LOGIC;
                                              STD LOGIC:
122 VGA v sync
                                   out
123 VGA_red0
                                            STD_LOGIC;
                                 out
124 VGA_red1
                                           STD_LOGIC;
                                 out
125 VGA_red2
                                           STD LOGIC:
                                out
126 VGA_grn0
                               out
                                            STD_LOGIC;
127 VGA_grn1
                                out
                                           STD_LOGIC;
128 VGA grn2
                                           STD LOGIC:
                                 out
129 VGA_blu1
                                            STD LOGIC;
                                out
130 VGA_blu2
                                           STD_LOGIC;
131 --PS2 kbd signals
132 PS2_kbd_ck
                                      in
                                           STD_LOGIC;
133 PS2_kbd_data
                                 in
134 --
135 -- general signals
136 CK_25MHz
137 buttons_in
                  : in STD_LOGIC; -- main clock input to the Host interface. From this clock we create all other clock signals in the design : in STD_LOGIC_vector(3 downto 0);-- btn0 is single clock (manual clock), btn3 is manual reset
138 switches_in
                       : in STD_LOGIC_VECTOR (7 downto 0); -- 4-0 to select which part to be displayed on the 7Segnets LEDs
                       : out STD_LOGIC_VECTOR (6 downto 0); -- to the 7 seg LEDs
: out STD_LOGIC_VECTOR (3 downto 0); -- to the 7 seg LEDs
139 sevenseg_out
140 anodes_out
141 leds_out
                        : out STD_LOGIC_VECTOR (7 downto 0); -- to 8 LEDs (leftmost = Flash status, next = MIPS_ck, 6 right ones = version number)
142 --
144 -- RDBK signals
145 rdbk0
                                STD LOGIC VECTOR (31 downto 0):
                         in
                        in STD_LOGIC_VECTOR (31 downto 0);
146 rdbk1
147 rdbk2
                       in
148 rdbk3
149 rdbk4
150 rdbk5
                       in
                              STD_LOGIC_VECTOR (31 downto 0);
                              STD_LOGIC_VECTOR (31 downto 0);
STD_LOGIC_VECTOR (31 downto 0);
151 rdbk6
                       in
152 rdbk7
                              STD_LOGIC_VECTOR (31 downto 0);
STD_LOGIC_VECTOR (31 downto 0);
STD_LOGIC_VECTOR (31 downto 0);
153 rdbk8
                       in
154 rdbk9
                       in
155 rdbk10
                               STD_LOGIC_VECTOR (31 downto 0);
STD_LOGIC_VECTOR (31 downto 0);
STD_LOGIC_VECTOR (31 downto 0);
156 rdhk11
                        in
157 rdbk12
                        in
158 rdbk13
                        in
                               STD_LOGIC_VECTOR (31 downto 0);
STD_LOGIC_VECTOR (31 downto 0)
159 rdhk14
                        in
160 rdbk15
                        in
161 );
162 END COMPONENT:
163
164
166 -- put your components declarations here
167
169 COMPONENT Fetch_Unit is
170 Port (
171 -- general input signals
172 CK_25MHz : in STD_LOGIC;
173 RESET_in : in STD_LOGIC;
                     : in STD_LOGIC;
174 HOLD in
175 -- MIPS signals
                            : out STD_LOGIC_VECTOR (31 downto 0); -- The IR_reg (instruction) to be used in ID
: out STD_LOGIC_VECTOR (31 downto 0); -- The sext_imm to be used in ID
: out STD_LOGIC_VECTOR (31 downto 0); -- The PC_reg value in IF. To be read by TB in simulation and rdbk in implementation - for verific out STD_LOGIC_VECTOR (31 downto 0); -- The PC_plus_4 value in ID --@@@HMWG JAL support - this is the address to be written to $ra ($31) in the in STD_LOGIC; -- '1' if value read from Rs equals the value read from Rt, '0' otherwise. Used in branch instructions.
176 IR_reg_pID
177 sext imm pID
178 PC_reg_pIF
179 PC_plus_4_pID_out
180 Rs_equals_Rt_pID
                             : in
181 jr_adrs_in
                                        STD_LOGIC_VECTOR (31 downto 0);-- @@@HW6 JR support -- the value to be load into the PC in jr instruction
                                                                                                                                                                           --@@@HW6 add JR support
182 --- IMem signals
183 MIPS_IMem_adrs
                               : out STD_LOGIC_VECTOR (31 downto 0);
184 MIPS_IMem_rd_data : in STD_LOGIC_VECTOR (31 downto 0)
185
         ):
186 END COMPONENT;
187
188
    189
190 COMPONENT GPR is
191 Port (
192 -- RST
                                       STD LOGIC:
                    in
193 CK
                                     STD_LOGIC;
                                   STD_LOGIC_VECTOR (4 downto 0); -- Rs
STD_LOGIC_VECTOR (4 downto 0); -- Rt
194 rd reg1
195 rd_reg2
                         in
                   :
                                    STD_LOGIC_VECTOR (4 downto 0); -- Rd (in R-Type instruction, Rt in LW)
STD_LOGIC_VECTOR (31 downto 0); -- Rs contents
STD_LOGIC_VECTOR (31 downto 0); -- Rt contents
196 wr_reg
197 rd_data1
                            in
                          out
198 rd_data2
                           out
                                    STD_LOGIC_VECTOR (31 downto 0); -- contents to be written into Rd (or Rt) STD_LOGIC; -- "0" means no register is written into STD_LOGIC -- "1" means no register is written into
199 wr_data
                         in
                            in
200 Reg Write
201 GPR_hold
202
203 end COMPONENT;
204
205
207 COMPONENT MIPS_ALU is
208 Port (
    -- ALU operation control inputs
```

```
STD_LOGIC_VECTOR(1 downto 0); -- 00=add, 01=sub, 10=by Function
STD_LOGIC_VECTOR(5 downto 0); -- 32=ADD, 34=sub, 36=AND, 37=OR, 38=XOR, 42=SLT
210 ALUOP
                : in
: in
211 Funct
212 -- data inputs & data control inputs
213 A_in : in STD_LOGIC_VECTOR(31 downto 0);
214 B_in : in STD_LOGIC_VECTOR(31 downto 0);
215 sext_imm : in STD_LOGIC_VECTOR(31 downto 0);
216 ALUSTCB : in STD_LOGIC;
217 -- data output
217 -- data output
218 ALU_out : out STD_LOGIC_VECTOR(31 downto 0)
218 ALU_out
219
          );
220 end COMPONENT;
221
222
223
224
225
226
227 --
229
230 -- signals connecting the components, inputs & external logic
232 -- Reset and CK sianals
233 signal CK : STD_LOGIC :='0';
234 signal RESET: STD_LOGIC:='0'; -- The main RESET signal combined from switches in & MIPS_reset
235 signal HOLD: STD_LOGIC:='0'; -- The main RESET signal combined from switches in & MIPS_reset
236 signal RESET_from_Host_Intf: STD_LOGIC; -- is coming from the BYOC_Host_intf
238
239 -- Flash data bus signals (used to connect to the Flash_data "inout" pin)
240 signal data_from_Flash : STD_LOGIC_VECTOR (15 downto 0);
241 signal data_to_Flash : STD_LOGIC_VECTOR (15 downto 0);
242 -- Flasn control signals
                                         : STD_LOGIC_VECTOR (15 downto 0);
243 signal
                   Flash_ce_n_line : STD_LOGIC;
                Flash_we_n_line : STD_LOGIC;
Flash_oe_n_line : STD_LOGIC;
244 signal
245 signal
246
247 signal
                Flash_rp_n_in_BYOC :
Flash_sts_in_BYOC :
                                                      STD_LOGIC; -- '0' when reseting Flash mem STD_LOGIC; -- '1' when Flash mem FSM is done
248 signal
249
250 signal leds out from host intf : STD LOGIC VECTOR (7 downto 0); -- 7=Flash stts, 6=MIPS ck, 5-0=Host intf version
251
252
253
254
255 --- ===
256 -- Your design signals
258
259
260 --- ========== MIPS signals =======================
261 ---
262
264 -- ============
265 -- almost all signals are inside the Fetch Unit
266
267 -- except IMem signals:
                  IMem_adrs
                                          : STD_LOGIC_VECTOR (31 downto 0);
268 signal
269 signal IMem_rd_data : STD_LOGIC_VECTOR (31 downto 0);
270
271 -- and we have the PC_reg (PC_reg_pIF) coming out of the Fetch_Unit for rdbk to Host_Intf & TB 272 signal PC_reg : STD_LOGIC_VECTOR (31 downto 0);
273
274 signal PC_plus_4_pID : STD_LOGIC_VECTOR (31 downto 0); -- @@@HW6 changes to support JAL instruction
275
276
     278 -----
279 -- ID phase (a register with valid value along the ID phase)
280 signal IR_reg : STD_LOGIC_VECTOR (31 downto 0);
281 -- IR reg signals (valid in ID phase)
282 signal Opcode : STD_LOGIC_VECTOR (5 downto 0); -- IR[5:0]
283 signal Rs : STD_LOGIC_VECTOR (4 downto 0); -- IR[25:21]
284 signal Rt : STD_LOGIC_VECTOR (4 downto 0); -- IR[25:21]
285 signal Rd : STD_LOGIC_VECTOR (4 downto 0); -- IR[15:11]
286 signal Funct : STD_LOGIC_VECTOR (5 downto 0); -- IR[5:0]
287
288 signal rt_tmp : STD_LOGIC_VECTOR(4 downto 0);
289
290 -- other signals active in ID phase
291 signal sext_imm : STD_LOGIC_VECTOR (31 downto 0);
292 signal GPR rd data1 : STD LOGIC VECTOR (31 downto 0);
293 signal GPR_rd_data2 : STD_LOGIC_VECTOR (31 downto 0);
294 signal Rs_equals_Rt : STD_LOGIC; -- '1' if contents of Rs equals the contents of Rt, '0' if not.
295
296 -- @@@HW6 - add JR support
297 signal jr_address : STD_LOGIC_VECTOR (31 downto 0); -- the Rs value (usually from GPR_rd_data1) to be loaded into the PC in jr instruction
298
299 -- @@@HW6 - adding branch forwarding
300 signal GPR_rd_data1_wt_fwd : STD_LOGIC_VECTOR (31 downto 0);--@@dHW6 adding branch forwarding 301 signal GPR_rd_data2_wt_fwd : STD_LOGIC_VECTOR (31 downto 0);--@@dHW6 adding branch forwarding
302
303
304 -- MIPS control signals - created at the ID phase
305 -----
306 -- Decoded signals for EX phase
307 signal ALUSrCB : STD_LOGIC;-- '0' selects A_reg, '1' selects sext sext_imm
308 signal ALUOP : STD_LOGIC_VECTOR (1 downto 0);-- 00=add, 01=sub, 10=by Function --@@@HW6 11=or to support ORI instruction
309 signal RegDst : STD_LOGIC;--'0' selects Rt, '1' selects Rd
310 -- Decoded signals for MEM phase
311 signal MemWrite: STD_LOGIC;-- '1' for writing to the DMem
312 -- Decoded signals for WB phase
313 signal RegWrite: STD_LOGIC;-- '1' for writing to the GPR file
314 signal MemToReg: STD_LOGIC;-- '1' for writing MDR data to the GPR file, '0 for writing ALUout_reg_pWB data to the GPR file
```

```
316 signal JAL
                        : STD_LOGIC;-- '1' in JAL instruction -- @@@HW6 - adding JAL instruction
317
318
320 -----EX phase ------
321 --=======
322 -- Registerd valid in EX phase
323 signal A_reg : STD_LOGIC_VECTOR (31 downto 0); 324 signal B_reg : STD_LOGIC_VECTOR (31 downto 0);
325 signal
            sext_imm_reg
                              : STD_LOGIC_VECTOR (31 downto 0);
326 signal Rt_pEX : STD_LOGIC_VECTOR (4 downto 0);
327 signal Rd_pEX : STD_LOGIC_VECTOR (4 downto 0);
328 signal ALU_output : STD_LOGIC_VECTOR (31 downto 0);
329
330 signal PC_plus_4_pEX : STD_LOGIC_VECTOR (31 downto 0);
                                                                     --@@@HW6 - adding JAL instruction
331
332 signal A_reg_wt_fwd : STD_LOGIC_VECTOR (31 downto 0);
333 signal B_reg_wt_fwd : STD_LOGIC_VECTOR (31 downto 0);
334 signal Rs_pEX : STD_LOGIC_VECTOR (4 downto 0);
                                                                   --@@@HW6 - adding data forwarding
--@@@HW6 - adding data forwarding
                                                                            --@@@HW6 - adding data forwarding
335
337
338 -- MIPS control signals - created at the ID phase - delayed to EX phase
340 -- Decoded signals for EX phase
341 signal ALUsrcB_PEX : STD_LOGIC;
342 signal Funct_PEX : STD_LOGIC_VECTOR (5 downto 0);--IR[5:0]
343 signal ALUOP_PEX : STD_LOGIC_VECTOR (1 downto 0);
344 signal RegDst_pEX : STD_LOGIC;
345 signal RegWrite_pEX : STD_LOGIC;
346 signal MemWrite_pEX : STD_LOGIC;
347 signal MemWroReg_pEX : STD_LOGIC;
348
                           : STD_LOGIC;--@@@HW6 adding JAL instruction
349 signal JAL_pEX
350
351
352
353
    354 -----
355 -- Registerd valid in EX phase
356 signal B_reg_pMEM : STD_LOGIC_VECTOR (31 downto 0);
                           : STD_LOGIC_VECTOR (4 downto 0);
: STD_LOGIC_VECTOR (31 downto 0);
357 signal Rd_pMEM
358 signal ALUout reg
360 signal PC_plus_4_pMEM : STD_LOGIC_VECTOR (31 downto 0); --@@@HW\theta - adding JAL instruction
361
362
363 -- MIPS control signals - created at the ID phase - delayed to EX phase
364
365 -- Decoded signals for MEM phase
366 signal RegWrite_pMEM : STD_LOGIC;
367 signal MemWrite_pMEM : STD_LOGIC;
368 signal MemToReg_pMEM : STD_LOGIC;
369
370 signal JAL_pMEM
                        : STD_LOGIC;--@@@HW6 adding JAL instruction
371
372
373
374
376 -----
377 --Registers valid in WB phase
378 signal MDR_reg : STD_LOGIC_VECTOR (31 downto 0); -- renaming of the MIPS_DMem_rd_data signal 379 signal ALUout_reg_pWB : STD_LOGIC_VECTOR (31 downto 0);
380 signal GPR_wr_data : STD_LOGIC_VECTOR (31 downto 0);
381 signal Rd_pWB : STD_LOGIC_VECTOR (4 downto 0);
383 signal PC_plus_4_pWB : STD_LOGIC_VECTOR (31 downto 0); --@@@HW6 adding JAL instruction
384
386 -- signals valid in WB phase
387 -- MIPS control signals - created at the ID phase - delayed to WB phase
389 -- Decoded signals for WB phase
390 signal RegWrite_pWB : STD_LOGIC ;
391 signal MemToReg_pWB : STD_LOGIC ;
392
393 signal JAL_pWB
                            : STD_LOGIC;--@@@HW6 adding JAL instruction
394
395
396
397 --- ======= End of MIPS signals ===================
398 ---
400
401
    403 --- Host Intf signals
404
405 signal rdbk3_vec : STD_LOGIC_VECTOR(31 downto 0);
406 signal rdbk4_vec : STD_LOGIC_VECTOR(31 downto 0);
407 signal rdbk5_vec : STD_LOGIC_VECTOR(31 downto 0);
408 signal rdbk12_vec : STD_LOGIC_VECTOR(31 downto 0);
409
410
411
412
       ***********************************
413
414
415
416
417
418
419 begin
```

315

```
421
422
423 -- Component connections
425 -- Connect all components used: Clock_Driver, BYOC_Host_intf, your components ...
426 -- ==========
428 -- Connecting the Clock_Driver
429 -
430 clock_divider : Clock_Driver
431 port map
432 (
                         => CK_50MHz, -- directly form the HW_MIPS i/o pin
=> CK -- the CK signal to the entire HW4_MIPS design
433
        CK_50MHz_IN
       CK_25MHz_OUT
434
435
      );
436
437 -- Connecting the HW4_Host_intf
439 hostintf : BYOC_Host_intf
440 Port Map(
441 --==== The student's part =============
442 -- MIPS signals [to be used by students]
443 MIPS_reset => RESET_from_host_intf, -- The Host_intf drives the RESET signal
443 MIPS_reset =>
444 MIPS_hold =>
                                                       -- The Host_intf also drives the HOLD signal
                                      HOLD,
445 -- MIPS IMem signals
446 MIPS_IMem_adrs =>
447 MIPS_IMem_rd_data =>
                                        IMem_adrs,
                                                          -- driven by the Fetch_Unit
                                      IMem_rd_data, -- driven by the Host_intf and sent to the Fetch_Unit
448 -- MIPS_DMem ignals

448 -- MIPS_DMem we spinals

449 MIPS_DMem_we => MemWrite_pMEM, -- '1' if we want to write into DMem at the next rising edge of the MIPS_ck (for sw instruction)

450 MIPS_DMem_wr_data => ALUOUT_reg, -- driven by the ALUOUT_reg = The address to DMem

451 MIPS_DMem_rd_data => B_reg_pMEM, -- The data to be written into DMem_adrs in sw instruction

452 MIPS_DMem_rd_data => MDR_reg, -- The data read from DMem_adrs in lw instruction. It is registered, i.e.= the MDR data
453 --
454 ------ Other signals to be directed to i/o pins ------
455 -- Flash Mem signals
456 Flash_adrs
                                          Flash_adrs,
457 Flash_ce_n
458 Flash_we_n
                                        Flash_ce_n_line,
Flash_we_n_line,
458 Flash we n => Flash we n

459 Flash oe n => Flash oe n

460 Flash rp n => Flash rp n

461 Flash sts => Flash sts,

462 Flash data => data_from_F

463 Flash wr_data => data_to_Fla

464 --
                                        Flash_oe_n_line
                                         Flash_rp_n_in_BYOC,
                                       data_from_Flash,
                                      data_to_Flash,
465 -- Infrastructure signals [To be used by PC via RS232 or from Nexys2 board switches & pushbuttons, and VGA signals to the screen],
466 -- Host intf signals
467 RS232 Rx =>
467 RS232_Rx
                                     RS232_Rx,
468 RS232_Tx
                           =>
                                      RS232_Tx,
469 -- VGA signals
469 -- VGA 3 cg...
470 VGA_h_sync
                                        VGA_h_sync,
471 VGA_v_sync
472 VGA_red0
                                      VGA_v_sync,
VGA_red0,
                             =>
                           =>
473 VGA_red1
                                      VGA_red1,
474 VGA red2
                           =>
                                      VGA red2.
475 VGA_grn0
                                      VGA_grn0,
                           =>
476 VGA_grn1
                            =>
                                      VGA_grn1,
477 VGA grn2
                           =>
                                      VGA grn2,
478 VGA_blu1
                           =>
                                      VGA_blu1,
479 VGA_blu2
                                      VGA_blu2,
480 --PS2 kbd signals
481 PS2_kbd_ck
                              =>
                                         PS2C,
482 PS2_kbd_data =>
                                      PS2D,
483 --
484 --general signals
485 CK_25MHz
486 buttons in
                                     CK, -- CK_25MHz from the Clock_Driver buttons in,
switches_in,
                                      sevenseg_out,
                                        anodes out.
                                     leds_out_from_host_intf,
492 --=
           ======== additional part for student ========================
493 -- RDBK signals
494 rdbk0
                                       PC_reg,
495 rdbk1
                                       IR reg,
                         =>
                      =,
=>
=>
                                      sext_imm,
rdbk3_vec,
496 rdbk2
497 rdbk3
498 rdbk4
                                       rdbk4_vec,
499 rdbk5
                         =>
                                      rdbk5_vec,
500 rdbk6
                         =>
                                      A reg,
501 rdbk7
                         =>
                                      B_reg,
502 rdbk8
                         =>
                                      sext_imm_reg,
503 rdbk9
                         =>
504 rdbk10
                                       ALUout_reg,
505 rdbk11
                                        B_reg_pMEM,
506 rdbk12
                                       rdbk12 vec,
                          =>
                                       MDR_reg,
507 rdbk13
                                       ALUout_reg_pWB,
GPR_wr_data
508 rdbk14
509 rdbk15
510 );
511
512
514 -- Connecting the Fetch_Unit
515 --
516 fetch_unit_imp : Fetch_Unit
517 Port map (
518 -- general input signals
519 CK_2SMHz => CK,
520 RESET_in => F
521 HOLD_in => HO
                                     RESET,
HOLD,
                                   IR_reg, -- connecting IR_reg_pID to the signal called IR_reg
sext_imm, -- same for the signal called sext_imm
524 sext_imm_pID =>
```

420

```
525 PC reg pIF
                                 PC_reg,
526 PC_plus_4_pID_out =>
                             PC_plus_4_pID, --@@@HW6 for JR support
527 Rs_equals_Rt_pID =>
                             Rs_equals_Rt,
528 jr_adrs_in
529 --- IMem signals
                                 jr_address, --@@@HW6 for JR support
530 MIPS_IMem_adrs =>
531 MIPS_IMem_rd_data =>
                            IMem_adrs,
IMem rd data
532 );
533
534
536 -- Connecting the GPR file
537 -
538 GPR_file : GPR
539 Port map (
540 --RST
                       not connected
                           CK,
541 CK
542 rd reg1
                =>
                          Rs,
               =>
=>
543 rd_reg2
                         Rt,
544 wr_reg
                             Rd_pWB,
              =>
=>
=>
                         GPR_rd_data1,
545 rd_data1
546 rd_data2
                          GPR_rd_data2,
546 ru_uac..
547 wr_data =>
548 Reg_Write =>
- com hold =>
                        GPR_wr_data,
                         RegWrite pWB,
550 );
551
553 -- Connecting the MIPS_ALU
554 -- ------
555 ALU : MIPS_ALU
556 Port map (
557 -- ALU operation control inputs
558 ALUOP =>
559 Funct =>
                           ALUOP_pEX,
559 Funct
                           Funct_pEX,
560 -- data inputs & data control inputs
561 A_in => A_reg_wt_fwd, -- @@@HW6 should be A_reg_wt_fwd for adding data forwarding in EX phase
562 B in => B reg_wt_fwd, -- @@@HW6 should be B reg_wt_fwd for adding data forwarding in EX phase
                         B_reg_wt_fwd,
sext_imm_reg,
                                          -- @@@HW6 should be B_reg_wt_fwd for adding data forwarding in EX phase
563 sext_imm =>
564 ALUSTCB
                            ALUsrcB_pEX,
565 -- data output
566 ALU_out
                            ALU_output
567
     );
568
570
571 -- all signal equations
573
574 -- Signals to external components
576 -- disconnecting the Mobile SRAM
577 MT_ce_n <= '1'; -- making sure that the SRAM is not active
578
579 -- connecting Flash_data bidir signal
580 data_from_Flash <= Flash_data;
581 Flash_data <= data_to_Flash
                                     when (Flash_oe_n_line ='1' and Flash_ce_n_line='0') else (others => 'Z');
582
583 -- connecting other Flash signals
584 Flash_ce_n <= Flash_ce_n_line;
585 Flash_oe_n <= Flash_oe_n_line;
586 Flash_we_n <= Flash_we_n_line;
587
                 <= Flash_rp_n_in_BYOC and ( not switches_in(4) );</pre>
588 Flash rp n
589 Flash_sts_in_BYOC <= Flash_sts;
590
591 -- Leds_out(7) <= Flash_sts_in_BYOC;
592 leds_out <= Flash_sts_in_BYOC & leds_out_from_host_intf(6 downto 0); -- 7=Flash_stts, 6=MIPS_ck, 5-0=Host_intf version
593
594
    -- General signals
596 -- ======
597 RESET <= switches in(6) or RESET from Host Intf;
599
600 -- Here is your part, i.e., your equations
603 -- ========
604 -- no such processes. They are all inside the Fetch Unit
605
606 --
                   609 Opcode <= IR_reg(31 downto 26);
610 --Rs <= IR_reg(25 downto 21);
611 -----
612 --Rt <= IR_reg(20 downto 16); --@@@HW6 a change is required here to support JAL
613 -----
614 Rd
         <= IR_reg(15 downto 11);
615 Funct <= IR_reg(5 downto 0);
616
617 --beq & bne & jr forwarding
                                                --@@@HW6 adding branch & JR forwarding
618 -- A mux of the Rs_equal_Rt comparator (beq/bne forwarding)
619 process (RegWrite_pMEM, Rd_pMEM, Rs, GPR_rd_data1, ALUout_reg)
620 begin
621
        if RegWrite_pMEM = '1' and Rd_pMEM = Rs and Rs /= b"00000" then
622
            GPR_rd_data1_wt_fwd <= ALUout_reg;</pre>
623
        else
624
           GPR_rd_data1_wt_fwd <= GPR_rd_data1;</pre>
625
        end if:
626 end process;
627
628 --B mux of the Rs_equal_Rt comparator (beq/bne forwarding)
629 process (RegWrite_pMEM, Rd_pMEM, Rt, GPR_rd_data2, ALUout_reg)
                                                                               --@@@HW6 adding branch & JR forwarding
```

```
630 begin
        if RegWrite_pMEM = '1' and Rd_pMEM = Rt and Rt /= b"00000" then
631
632
              GPR_rd_data2_wt_fwd <= ALUout_reg;</pre>
         else
633
              GPR_rd_data2_wt_fwd <= GPR_rd_data2;</pre>
634
635
         end if:
636 end process:
638 --beq/bne comparator
                                  --@@@HW6 adding branch forwarding means a change here
639 process (GPR_rd_data1_wt_fwd, GPR_rd_data2_wt_fwd)
640 begin
        if GPR_rd_data1_wt_fwd = GPR_rd_data2_wt_fwd then
641
642
             Rs_equals_Rt <= '1';
643
            Rs_equals_Rt <= '0';
644
         end if;
645
646 end process;
647
648 ----@@@HW6 add JR support -- HW6 adding JR forwarding means a change here
649 jr_address <= GPR_rd_data1_wt_fwd;
650
652
653
655 -- Control decoder - calculates the signals in ID phase
656 -- creates the following signals according to the opcode:
                                 "0' - selects B_reg, '1' - selects sext_imm_reg
b"00" - add, b"01" - sub, b"10" - the Function field determines the ALU operation, b"11" - or --@@@HW6 adding ORI support
'1' - "Rd"=Rd (write to Rd - in Rtype inst. only), '0' - "Rd"=Rt (write to Rt - in all other instructions)
                 ALUSTCB
658 --
                 ALUOP
659 --
                 RegDst
                             '1' - WI = KG (Write to NG - th Keype dist. chey),
'1' - write to DMem
'0' - write ALUout_reg data (to "Rd"), '1' - write MDR_reg data (to "Rd")
'1' - write to GPR file (to "Rd")
'1' - wrhen we are in jal instruction --@@@HW6 adding JAL support
660 --
661 --
                 MemToReg
662
                 RegWrite
663 --
                 JAL
664 process (Opcode, IR_reg)
665 begin
666
        if Opcode = 8 or Opcode = 13 or Opcode = 15 or Opcode = 35 or Opcode = 43 then --addi or ori or lui or lw or sw
             ALUsrcB <= '1';
667
668
669
             ALUsrcB <= '0':
        end if;
670
671
        if Opcode = 0 then --rtype
  ALUOP <= b"10";
elsif Opcode = 4 or Opcode = 5 then --beq and bne</pre>
672
673
674
675
        ALUOP <= b"01";
elsif Opcode = 13 then -- ori
676
677
             ALUOP <= b"11";
678
         else
             ALUOP <= b"00";
679
680
         end if;
681
682
683
         if Opcode = 0 then --rtype
684
             RegDst <= '1';
685
         else
686
             RegDst <= '0';
         end if:
687
688
689
         if Opcode = 43 then --sw
690
             MemWrite <= '1';</pre>
691
         else
692
             MemWrite <= '0';
         end if:
693
694
695
         if Opcode = 35 then -- Lw
             MemToReg <= '1';</pre>
696
697
         else
             MemToReg <= '0';</pre>
698
699
         if Opcode = 0 or Opcode = 8 or Opcode = 13 or Opcode = 15 or Opcode = 3 or Opcode = 3 then --rtype or addi or ori or lui or jal or lw
701
             RegWrite <= '1';
702
             RegWrite <= '0';
704
705
        if Opcode = 3 then --jal
    JAL <= '1';</pre>
707
708
709
              Rt <= b"11111";
         else
710
           JAL <= '0';
711
712
              Rt <= IR_reg(20 downto 16);
         end if;
713
714
        if Opcode = 15 then --lui
    Rs <= b"00000";</pre>
715
716
717
718
            Rs <= IR_reg(25 downto 21);
         end if;
719
721 end process;
722
725 -- ------
726 -- A & B registers
727 process (RESET, CK)
728 begin
      if RESET='1' then
729
        A_reg <= x"00000000";
elsif CK'event and CK='1' and HOLD ='0' then
730
        A_reg <= GPR_rd_data1;
end if;
732
733
734 end process;
```

```
735
736 process (RESET, CK)
737 begin
         if RESET='1' then
738
         B_reg <= x"00000000";
elsif CK'event and CK='1' and HOLD ='0' then
740
         B_reg <= GPR_rd_data2;
end if;
741
742
743 end process;
744
745 -- with forwarding
                                                                                        -- @@@HW6 adding data forwarding
746 -- src_A mux (forwarding)
746 -- src_A mux (forwarding) -- @@@HW6 adding data forwarding in EX phase 747 process (RegWrite_pMEM, Rd_pMEM, Rs_pEX, RegWrite_pWB, Rd_pWB, JAL_pMEM, GPR_wr_data, ALUout_reg, A_reg)
748 begin
        if RegWrite_pMEM = '1' and Rd_pMEM = Rs_pEX and Rs_pEX /= b"00000" and JAL_pMEM = '0' then
749
              A_reg_wt_fwd <= ALUout_reg;
750
         elsif RegWrite_pWB = '1' and Rd_pWB = Rs_pEX and Rs_pEX /= b"00000" then A_reg_wt_fwd <= GPR_wr_data;
751
752
753
         A_reg_wt_fwd <= A_reg;
end if;
754
755
756 end process;
757
758 -- src B mux (forwarding part)
                                                                                          -- @@@HW6 adding data forwarding in EX phase
759 process (RegWrite_pMEM, Rd_pMEM, Rt_pEX, RegWrite_pWB, Rd_pWB, JAL_pMEM, GPR_wr_data, ALUout_reg, B_reg)
760 begin
         if RegWrite_pMEM = '1' and Rd_pMEM = Rt_pEX and Rt_pEX /= b"00000" and JAL_pMEM = '0' then
761
              B_reg_wt_fwd <= ALUout_reg;
         elsif RegWrite_pWB = '1' and Rd_pWB = Rt_pEX and Rt_pEX /= b"00000" then B_reg_wt_fwd <= GPR_wr_data;
763
764
765
         else
766
             B_reg_wt_fwd <= B_reg;
         end if;
767
768 end process;
769
770
771 -- sext_imm register
772 process (RESET, CK)
773 begin
774
         if RESET='1' then
         sext_imm_reg <= x"00000000";
elsif CK'event and CK='1' and HOLD ='0' then
775
776
         sext_imm_reg <= sext_imm;
end if;</pre>
777
778
779 end process;
780
                          --@@@HW6 added for data forwarding support
781 -- Rs register
782 process (RESET, CK)
783 begin
        if RESET='1' then
784
         Rs_pEX <= b"00000";
elsif CK'event and CK='1' and HOLD ='0' then
785
786
        Rs_pEX <= Rs;
end if;
787
788
789 end process:
790
791 -- Rt register
792 process (RESET, CK)
793 begin
      if RESET='1' then
794
        Rt_pEX <= b"00000";
elsif CK'event and CK='1' and HOLD ='0' then
795
796
         Rt_pEX <= Rt;
end if;</pre>
797
798
799 end process;
800
801 -- Rd register
802 process (RESET, CK)
803 begin
       if RESET='1' then
804
         Rd_pEX <= b"00000";
elsif CK'event and CK='1' and HOLD ='0' then
806
           Rd pEX <= Rd;
807
         end if;
809 end process;
810
811 -- PC_plus_4_pEX --@@@HW6 added to support JAL instruction 812 process (RESET, CK)
813 begin
         if RESET='1' then
814
         PC_plus_4_pEX <= x"00000000";
elsif CK'event and CK='1' and HOLD ='0' then
815
816
817
            PC_plus_4_pEX <= PC_plus_4_pID;
         end if;
818
819 end process;
820
821 -- control signals regs --@@@HW6 add JAL support here to
822 process (RESET, CK)
823 begin
         if RESET='1' then
824
              ALUsrcB_pEX <= '0';
              Funct_pEX <= b"000000";
ALUOP_pEX <= b"00";
826
827
              RegDst_pEX <= '0';
RegWrite_pEX <= '0';
MemWrite_pEX <= '0';
828
829
830
831
              MemToReg_pEX <= '0';</pre>
         JAL_pEX <= '0';
elsif CK'event and CK='1' and HOLD ='0' then</pre>
832
833
834
              ALUsrcB_pEX <= ALUsrcB;
              Funct_pEX <= Funct;
ALUOP_pEX <= ALUOP;</pre>
835
836
837
              RegDst_pEX <= RegDst;</pre>
              RegWrite_pEX <= RegWrite;
MemWrite_pEX <= MemWrite;
838
```

```
840
            MemToReg_pEX <= MemToReg;</pre>
             JAL_pEX <= JAL;
842
        end if:
843 end process:
845 -- RegWrite_pEX, MemToReg_pEX, MemWrite_pEX FFs
847
848
849 -- =
                   852 process (RESET, CK)
853 begin
        if RESET='1' then
854
         ALUout_reg <= x"00000000";
elsif CK'event and CK='1' and HOLD ='0' then
855
          ALUout_reg <= ALU_output;
857
        end if;
859 end process;
863 begin
        if RESET='1' then
        B_reg_pMEM < x"00000000";

elsif CK'event and CK='1' and HOLD ='0' then

if RegWrite_pMEM = '1' and Rd_pMEM = Rt_pEX and Rt_pEX /= b"00000" and JAL_pMEM = '0' then

B_reg_pMEM <= B_reg_wt_fwd;

elsif RegWrite_pWB = '1' and Rd_pWB = Rt_pEX and Rt_pEX /= b"00000" then
865
866
867
868
869
870
                B_reg_pMEM <= B_reg_wt_fwd;
871
             else
                 B_reg_pMEM <= B_reg;
872
873
             end if;
874
        end if:
875 end process;
876
877 -- RegDst mux and Rd_pMEM register
878 process (RESET, CK, RegDst_pEX, Rt_pEX, Rd_pEX)
879 begin
       if RESET='1' then
880
        1+ RESE[='1' then
   Rd_pMEM <= b"00000";
elsif CK'event and CK='1' and HOLD ='0' then
   if RegDst_pEX = '0' then
     Rd_pMEM <= Rt_pEX;</pre>
882
883
885
            else
                 Rd_pMEM <= Rd_pEX;
886
            Rd_µ
end if;
887
888
        end if;
889 end process;
891 -- PC_plus_4_pMEM reg --@@@HW6 added to support JAL instruction
892 process (RESET, CK)
893 begin
        if RESET='1' then
894
         PC_plus_4_pMEM <= x"000000000";
elsif CK'event and CK='1' and HOLD ='0' then
895
896
897
          PC_plus_4_pMEM <= PC_plus_4_pEX;
898
         end if;
899 end process;
900
901 -- control signals FFs
902 -- RegWrite_pMEM, MemTOReg_pMEM, MemWrite_pEX FFs --@@@HW6 add JAL_pMEM to support JAL
903 process (RESET, CK)
904 begin
        if RESET='1' then
905
906
            RegWrite_pMEM <= '0';
             MemToReg_pMEM <= '0';
MemWrite_pMEM <= '0';
907
908
909
             JAL pMEM <= '0';
        elsif CK'event and CK='1' and HOLD ='0' then
         RegWrite_pMEM <= RegWrite_pEX;
MemToReg pMEM <= MemToReg pEX;</pre>
911
912
             MemWrite_pMEM <= MemWrite_pEX;</pre>
             JAL_pMEM <= JAL_pEX;</pre>
914
        end if;
915
916 end process;
917
918 --
                920 -- MDR reg - no need to define -- connected directly from BYOC Host intf - resides inside the DMem
921
922 --ALUout_pWB register
923 process (RESET, CK)
924 begin
925
        if RESET='1' then
        ALUout_reg_pWB <= x"00000000";
elsif CK'event and CK='1' and HOLD ='0' then
926
928
           ALUout_reg_pWB <= ALUout_reg;
        end if;
929
930 end process;
931
                          --@@@HW6 requires changes to support JAL instruction
932 -- MemToReg mux
933 process (MemToReg_pWB, MDR_reg, ALUout_reg_pWB, PC_plus_4_pWB, JAL_pWB)
934 begin
935
        if JAL_pWB = '1' then
936
            GPR_wr_data <= PC_plus_4_pWB;</pre>
        elsif MemToReg_pWB = '1' then
    GPR_wr_data <= MDR_reg;</pre>
937
938
939
        else
            GPR_wr_data <= ALUout_reg_pWB;</pre>
940
         end if;
942 end process;
943
    -- Rd_pWB register
```

```
945 process (RESET, CK)
 946 begin
                  if RESET='1' then
 947
                   Rd_pWB <= b"00000";
elsif CK'event and CK='1' and HOLD ='0' then
 948
 950
                        Rd_pWB <= Rd_pMEM;
                  end if;
 951
 952 end process;
 953
 954 -- PC_plus_4_pWB --@@@HW6 added to support JAL instruction
 955 process (RESET, CK)
 956 begin
 957
                  if RESET='1' then
958
959
                  PC_plus_4_pWB <= x"00000000";
elsif CK'event and CK='1' and HOLD ='0' then
                  PC_plus_4_pWB <= PC_plus_4_pMEM;
end if;
 960
 961
 962 end process;
 963
964 -- control signals FFs
965 -- RegWrite_pWB, MemToReg_pWB FFs --@@@HW6 added JAL_pWB FF to support JAL instruction
 966 process (RESET, CK)
 967 begin
                 if RESET='1' then
 968
                            RegWrite_pWB <= '0';</pre>
                            MemToReg_pWB <= '0';
JAL_pWB <= '0';</pre>
 970
 971
 972
                  elsif CK'event and CK='1' and HOLD ='0' then
                  RegWrite_pWB <= RegWrite_pMEM;
MemToReg_pWB <= MemToReg_pMEM;</pre>
 973
 974
 975
                           JAL_pWB <= JAL_pMEM;
                 end if;
 976
 977 end process;
 978
 980 --build special rdbk signals
980 --bitta special rabk signals
981 rdbk3_vec <= b"000" & Rs & b"000" & Rt & b"000" & Rd & b"00" & Funct;
982 rdbk4_vec <= b"000" & RegWrite & b"0000" & b"00000000" & b"00000000" & b"0000 & RegWrite & b"000" & RegWrite & b"000" & RegWrite & b"000" & BegWrite & b"0000" & BegWrite & b"000" & RegWrite & b"000" & BegWrite & b"000" & RegWrite & b"0000" & RegWrite & b"
 985
 987
 988
         ******************
 990
 991
 993 end Behavioral;
 994
```

```
1 --
2 --
3 -- This module is the Fetch Unit
5 --
6 --
7 --
8 --
9 -----
10 library IEEE;
11 use IEEE.STD_LOGIC_1164.ALL;
12 use IEEE.STD_LOGIC_ARITH.ALL;
13 use IEEE.STD_LOGIC_UNSIGNED.ALL;
16
17
18 entity Fetch_Unit is
19 Port (
20 --
         : in STD_LOGIC;
: in STD_LOGIC;
: in STD_LOGIC;
21 CK 25MHz
22 RESET in
23 HOLD in
24 -- IMem signals
25 MIPS IMem adrs
               : out STD_LOGIC_VECTOR (31 downto 0);
26 MIPS_IMem_rd_data : in STD_LOGIC_VECTOR (31 downto 0);
27 IR_reg_pID : out STD_LOGIC_VECTOR (31 downto 0);
28 sext_imm_pID : out STD_LOGIC_VECTOR (31 downto 0);
29 PC reg pIF : out STD LOGIC VECTOR (31 downto 0);
30 Rs equals Rt pID : in STD LOGIC;
           : in STD_LOGIC_VECTOR (31 downto 0); --@@@HW6
ID_out : out STD_LOGIC_VECTOR (31 downto 0)--@@@HW6
31 jr_adrs_in
32 PC_plus_4_pID_out
33
     );
34 end Fetch_Unit;
35
36
37 architecture Behavioral of Fetch_Unit is
38
41
42
44 ------
               STD_LOGIC; -- is coming directly from the Fetch_Unit_Host_intfSTD_LOGIC; -- is coming directly from the Fetch_Unit_Host_intf
45 signal RESET
46 signal CK : STD_LOGIC; -- is coming directly from the Fetch_Unit_Host_inty
47 signal HOLD : STD_LOGIC; -- is coming directly from the Fetch_Unit_Host_inty
48 signal IMem_adrs : STD_LOGIC_VECTOR(31 downto 0):
49 signal IMem_rd_data : STD_LOGIC_VECTOR(31 downto 0);
50
51
54
57 --- IR & related signals
58 signal IR_reg
                    : STD_LOGIC_VECTOR (31 downto 0) := x"000000000";
59 signal imm : STD_LOGIC_VECTOR (15 downto 0);
60 signal sext_imm : STD_LOGIC_VECTOR (31 downto 0);
61 signal opcode : STD_LOGIC_VECTOR (31 downto 0);
61 signal opcode
                     : STD LOGIC VECTOR (5 downto 0);
                   : STD_LOGIC_VECTOR (5 downto 0);
62 signal funct
63
64 -- PC
65 signal PC_reg
              : STD_LOGIC_VECTOR (31 downto 0) := x"000000000";
66
67 -- PC_mux
68 -- control signal of PC mux
69 signal PC_Source : STD_LOGIC_VECTOR (1 downto 0); -- 0=PC+4, 1=BRANCH, 2=JR, 3=JUMP
```

```
70 -- input signals to PC mux
71 signal PC_plus_4 : STD_LOGIC_VECTOR (31 downto 0);
72 signal jump_adrs : STD_LOGIC_VECTOR (31 downto 0);
73 signal branch_adrs : STD_LOGIC_VECTOR (31 downto 0);
74 signal jr_adrs : STD_LOGIC_VECTOR (31 downto 0);
75
75 -- output
76 signal PC_mux_out : STD_LOGIC_VECTOR (31 downto 0);
77
78
79 signal PC plus 4 pID
                     : STD LOGIC VECTOR (31 downto 0);
80
81
82
85
86
87 -- additional "complex" rdbk signals
                 : STD_LOGIC_VECTOR (31 downto 0);
: STD_LOGIC_VECTOR (31 downto 0);
88 signal rdbk_vec1
89 signal rdbk_vec2
90
91
92
93
95
96
97 begin
98
99 -- Connecting the Fetch_Unit pins to inner signals
101 -- MIPS signals [to be used by students]
                  CK_25MHz;
102 CK
         <=
103 RESET
            <=
                    RESET_in;
        <=
                  HOLD_in;
104 HOLD
104 HOLD <= HOLD_in;
105 MIPS_IMem_adrs <= IMem_adrs;
106 IMem_rd_data <= MIPS_IMem_rd_data;
107 -- RDBK signals [to be used by students]
108
109 --
110 IR_reg_pID <= MIPS_IMem_rd_data;</pre>
111 sext_imm_pID <= sext_imm;</pre>
112 PC_reg_pIF <= PC_reg;</pre>
113
114
115
119
120 -- ------ IF phase processes ------
121 -- ------
122 -- PC register
123 process (RESET, CK, opcode)
124 begin
     if RESET='1' then
125
         PC_reg <= x"00400000";
126
127
      elsif CK'event and CK='1' and HOLD ='0' then
128
         if opcode = 2 or opcode = 3 then
129
            PC_reg <= jump_adrs; --@@@HW6
130
         else
            PC_reg <= PC_mux_out;</pre>
131
132
         end if;
133
      end if;
134 end process;
135
136 IMem_adrs <= PC_reg; -- connect PC_reg to IMem_adrs
137
138 -- PC source mux
139 process (PC_Source, PC_plus_4, branch_adrs, jr_adrs, jump_adrs)
```

```
140 begin
        if PC Source = b"00" then --PC plus 4
141
142
            PC mux out <= PC plus 4;
143
                PC Source = b"01" then --branch adrs
144
            PC mux out <= branch adrs;
145
        elsif PC Source = b"10" then --jr instruction
146
            PC mux out <= jr adrs;
147
        elsif PC Source = b"11" then --jump adrs
148
            PC mux out <= jump adrs;
149
        end if;
150 end process;
151
152 -- PC Adder - incrementing PC by 4 (create the PC_plus_4 signal)
153 PC_plus_4 <= PC_reg + 4;
154
155
                (rename of the IMem_rd_data signal)
156 -- IR reg
157 IR_reg <= IMem_rd_data;</pre>
158
159 imm <= IR_reg(15 downto 0);
160
161
162 -- imm sign extension
                               (create the sext imm signal)
163 process (imm, opcode)
164 begin
165
        if opcode = 15 then -- lui
166
            sext_imm <= imm & x"0000"; --@@@HW6</pre>
167
        elsif opcode /= 13 then -- @@@HW6 not ori
168
            if imm(15) = '0' then
                sext_imm <= x"0000" & imm;</pre>
169
            elsif imm(15) = '1' then
170
                sext imm <= x"FFFF" & imm;</pre>
171
172
            end if;
173
        else
174
            sext imm <= x"0000" & imm;</pre>
175
        end if;
176 end process;
177
178 -- BRANCH address (create the branch adrs signal)
179 branch adrs <= (sext imm(29 downto 0) & b"00") + PC plus 4 pID;
181 -- JUMP address
                      (create the jump adrs signal)
182 jump_adrs <= PC_plus_4_pID(31 downto 28) & IR_reg(25 downto 0) & b"00";
183
184 -- JR address
                       (create the jr adrs signal)
185 jr_adrs <= jr_adrs_in; --@@@HW6
186 PC_plus_4_pID_out <= PC_plus_4_pID; --@@@HW6
187
188 -- PC_plus_4_pID register (create the PC_plus_4_pID signal)
189 process (RESET, CK)
190 begin
        if RESET='1' then
191
192
            PC plus 4 pID <= x"00000000";
        elsif CK'event and CK='1' and HOLD ='0' then
193
194
            PC plus 4 pID <= PC plus 4;
195
        end if;
196 end process;
197
198 -- instruction decoder
199 opcode <= IR_reg (31 downto 26);
200 funct <= IR_reg (5 downto 0);
201
202
203 -- PC_source decoder (create the PC_source signal)
204 process (opcode, funct, Rs_equals_Rt_pID)
205 begin
        if opcode = b"000010" or opcode = b"000011" then --j or jal
206
            PC_source <= b"11"; --jump_adrs
207
               opcode = b"000100" and Rs_equals_Rt_pID = '1' then -- beq
208
            PC_source <= b"01"; --branch_adrs
209
```

```
elsif opcode = b"000101" and Rs equals Rt pID = '0' then -- bne
210
211
       PC_source <= b"01"; --branch_adrs</pre>
    elsif opcode = b"000000" and funct = b"001000" then -- jr
212
213
      PC_source <= b"10"; --jr_instruction</pre>
214
    else --any other
215
      PC_source <= b"00"; --PC_plus_4
216
217 end process;
218
219
221 -- your Fetch Unit code ends here
                        <<<<
223
224
225
226 -- rdbk signals
227 rdbk_vec1 <= x"0000000" & b"00" & PC_source; -- add Leading zeros to create 32 bit vec
228
229
230
231
232
233
234 end Behavioral;
235
```

```
1 -- This module is the MIPS General Purpose Register (GPR) file implementation for HW3
3 --
 4 --
 5 ---
6 library IEEE;
7 use IEEE.STD_LOGIC_1164.ALL;
8 use IEEE.STD_LOGIC_ARITH.ALL;
9 use IEEE.STD_LOGIC_UNSIGNED.ALL;
10
11
12 entity GPR is
13 Port(
14
15 CK
                     in
                               STD LOGIC;
16 rd_reg1
              :
                    in
                             STD_LOGIC_VECTOR (4 downto 0); -- Rs
17 rd_reg2
              :
                             STD_LOGIC_VECTOR (4 downto 0); -- Rt
                    in
                               STD_LOGIC_VECTOR (4 downto 0); -- Rd (in R-Type instruction, Rt in LW)
18 wr_reg
                     in
               :
19 rd data1
                              STD_LOGIC_VECTOR (31 downto 0);-- Rs contents
               :
                     out
20 rd_data2
                     out
                              STD_LOGIC_VECTOR (31 downto 0);-- Rt contents
               :
21 wr_data
                     in
                               STD_LOGIC_VECTOR (31 downto 0); -- contents to be written into Rd (or Rt)
              :
                               STD_LOGIC;-- "0" means no register is written into
22 Reg_Write
                     in
                             STD_LOGIC-- "1" means no register is written into
23 GPR_hold
                    in
24
25
       );
26 end GPR;
27
28
29 architecture Behavioral of GPR is
30
31 --signals used
32 signal Equal
                           : STD_LOGIC;
                                  STD_LOGIC_VECTOR (31 downto 0); -- Rt contents
33 signal GPR_rd_data1
                           :
                                  STD_LOGIC_VECTOR (31 downto 0);-- Rt contents
34 signal GPR_rd_data2
                           :
                          :
35 signal GPR_data_out1
                                  STD_LOGIC_VECTOR (31 downto 0); -- Rt contents
36 signal GPR_data_out2
                                  STD_LOGIC_VECTOR (31 downto 0); -- Rt contents
37 signal GPR_wr_data
                                STD_LOGIC_VECTOR (31 downto 0); -- Rt contents
38
39 signal GPR_we
                                   STD_LOGIC; -- the we signal to the memory. made of (Reg_Write and (not GPR_hold))
40
41
42 -- components used
43 COMPONENT dual_port_memory_no_CK_read IS
44 GENERIC(
45
       width : integer :=32;
46
       depth : integer :=32
47
    );
48 PORT (
49
                  : in integer range depth-1 downto 0;
    wr_address
50
    wr_data
                  : in std_logic_vector(width-1 downto 0);
                  : in std_logic;
51
    wr_clk
                  : in std_logic;
52
    wr en
                 : in integer range depth-1 downto 0;
53
    rd1_address
54
                  : out std logic vector(width-1 downto 0);
    rd1 data
    rd2_address : in integer range depth-1 downto 0;
55
56
    rd2_data
                   : out std_logic_vector(width-1 downto 0)
57
     );
58 END COMPONENT;
59
60
61
62 begin
63
64 GPR_wr_data <= wr_data;
65
66
67 -- produce rd_data1:
68 -- Here we ensure that reg 0 is always zero
69 process(rd_reg1, GPR_rd_data1, wr_reg, GPR_wr_data, Reg_Write)
70 begin
       if rd reg1 = b"00000" then
71
```

```
72
            GPR data out1 <= x"000000000";</pre>
 73
        elsif rd_reg1 = wr_reg and Reg_Write = '1' then
 74
            GPR_data_out1 <= GPR_wr_data;</pre>
 75
 76
            GPR_data_out1 <= GPR_rd_data1;</pre>
 77
        end if;
 78 end process;
 79
 80 rd_data1 <= GPR_data_out1;</pre>
 82 --process (rd_reg1, wr_reg, Reg_Write) --@@@HW6
 83 --begin
 84 --
          if rd_reg1 = wr_reg and Reg_Write = '1' then
 85 --
             rd_data1 <= wr_data;
 86 --
          else
 87 --
             rd data1 <= GPR data out1;
 88 --
         end if ;
 89 -- end process;
 90
 91
 92 -- produce rd data2:
 93 -- Here we ensure that reg 0 is always zero
 94 process(rd_reg2, GPR_rd_data2, wr_reg, GPR_wr_data, Reg_Write)
 95 begin
        if rd_reg2 = b"00000" then
 96
            GPR_data_out2 <= x"000000000";</pre>
 97
        elsif rd_reg2 = wr_reg and Reg_Write = '1' then
 98
99
            GPR_data_out2 <= GPR_wr_data;</pre>
100
            GPR_data_out2 <= GPR_rd_data2;</pre>
101
102
        end if;
103 end process;
104
105 rd_data2 <= GPR_data_out2;</pre>
106
107 --process (rd_reg2, wr_reg, Reg_Write) --@@@HW6
108 --begin
109 --
          if rd_reg2 = wr_reg and Reg_Write = '1' then
110 --
             rd data2 <= wr data;
111 --
          else
          rd_data2 <= GPR_data_out2;
112 --
113 --
         end if;
114 --end process;
115
116
117 GPR_we <= Reg_Write and (not GPR_hold);</pre>
119 -- connecting the GPR memory
120 GPR_file : dual_port_memory_no_CK_read
121 generic map (32, 32)
122 port map(
123 wr_address
                         conv_integer(wr_reg),
                  =>
                =>
124 wr_data
                         GPR_wr_data,
125 wr_clk
                  =>
                        CK,
126 wr en
                  =>
                        GPR we,
127 rd1_address
                 =>
                        conv_integer(rd_reg1),
128 rd1_data
                        GPR_rd_data1,
                  =>
129 rd2_address =>
                        conv_integer(rd_reg2),
130 rd2_data
                        GPR_rd_data2
                 =>
131);
132
134 end Behavioral;
```

```
1 --
 3 -- This module is the MIPS ALU for HW3
 4 --
 5 --
 6 --
 7 --
 8 -----
 9 library IEEE;
10 use IEEE.STD_LOGIC_1164.ALL;
11 use IEEE.STD_LOGIC_ARITH.ALL;
12 use IEEE.STD_LOGIC_UNSIGNED.ALL;
13
16
17 entity MIPS_ALU is
18 Port (
19 -- ALU operation control inputs
          : in STD_LOGIC_VECTOR(1 downto 0);-- 00=add, 01=sub, 10=by Function
20 ALUOP
              : in STD_LOGIC_VECTOR(5 downto 0);-- 32=ADD, 34=sub, 36=AND, 37=OR, 38=XOR, 42=SLT
21 Funct
22 -- data inputs & data control inputs
         : in STD_LOGIC_VECTOR(31 downto 0);
: in STD_LOGIC_VECTOR(31 downto 0);
23 A_in
24 B_in
25 sext imm
           : in STD_LOGIC_VECTOR(31 downto 0);
26 ALUsrcB
              : in STD_LOGIC;
27 -- data output
28 ALU_out : out STD_LOGIC_VECTOR(31 downto 0)
29
30 end MIPS_ALU;
31
32
33 architecture Behavioral of MIPS_ALU is
34
36
37
38
39 -- inner signals
40 -- -----
41 signal ALU_cmd : STD_LOGIC_VECTOR (2 downto 0); -- 000=AND, 001=OR, 010=ADD, 011=XOR, 110=sub, 111=slt, 100,101= not used for now
42 signal ALU_A_in : STD_LOGIC_VECTOR (31 downto 0);
43 signal ALU_B_in : STD_LOGIC_VECTOR (31 downto 0);
44 signal ALU_output : STD_LOGIC_VECTOR (31 downto 0);
45
46 signal sub_rslt : STD_LOGIC_VECTOR (32 downto 0); -- use this for creating the sign of sub in SLT instruction
47 signal sign_of_sub : STD_LOGIC;
48
49 -- Decoded signals for ID phase
50 signal LUI: STD_LOGIC;--'1' when we decode a LUI instruction 51 signal ORI: STD_LOGIC;--'1' when we decode an ORI instruction
52 signal JAL: STD_LOGIC; -- '1' when we decode a JAL instruction
53
54
55 begin
56
57 --ORI <= '0';
58
59
60 -- ALU
61 process(ALUOP, Funct, ORI)
62 begin
      if ALUOP = b"00" then
63
64
          ALU_cmd <= b"010"; -- ADD
65
      elsif ALUOP= b"01" then
             ALU_cmd <= b"110";-- SUB
66
      elsif ALUOP = b"11" then -- @@@ ORi HW6
67
          ALU_cmd <= b"001";
68
69
70
      else
          if Funct = b"100000" then
71
72
             ALU_cmd <= b"010"; -- FUNCT=ADD
          elsif Funct = b"100010" then
73
             ALU_cmd <= b"110"; -- FUNCT=SUB
74
75
          elsif Funct = b"100100" then
             ALU_cmd <= b"000"; -- FUNCT=AND
76
          elsif Funct = b"100101" then
77
78
             ALU cmd <= b"001"; -- FUNCT=OR
          elsif Funct = b"100110" then
79
             ALU_cmd <= b"011"; -- FUNCT=XOR
80
          elsif Funct = b"101010" then
81
```

```
ALU_cmd <= b"111"; -- FUNCT=SLT
 82
 83
           else
              ALU_cmd <= b"010"; -- ADD
 84
 85
           end if;
 86
       end if;
 87 end process;
 88 --
 89
 90
 91 ---- before forwarding
 92 process(ALUsrcB, sext_imm, B_in)
 93 begin
 94
       if ALUsrcB='0' then
 95
           ALU_B_in <= B_in;
 96
       else
          ALU_B_in <= sext_imm;
 97
 98
       end if;
99 end process;
100 ALU_A_in <= A_in;</pre>
101
102
103
104 -- if we consider both inputs as 2's comp numbers then
105 sub_rslt <= (ALU_A_in(31) & ALU_A_in) - (ALU_B_in(31) & ALU_B_in);</pre>
106 sign_of_sub <= sub_rslt(32);</pre>
107
108
109 process(ALU_A_in, ALU_B_in, ALU_cmd, sign_of_sub)
110
       begin
111
           case ALU_cmd is
112
               when b"000" =>
                               ALU_output <= ALU_A_in and ALU_B_in;-- AND
               when b"001" =>
                               ALU_output <= ALU_A_in or ALU_B_in; -- OR
113
               when b"010" =>
                              ALU_output <= ALU_A_in + ALU_B_in; -- ADD
114
115
               when b"011" =>
                               ALU_output <= ALU_A_in xor ALU_B_in; -- XOR
               when b"100" =>
116
                               ALU_output <= not(ALU_A_in and ALU_B_in); -- NAND
               when b"101" =>
                               ALU_output <= not(ALU_A_in or ALU_B_in); -- NOR
117
                               ALU_output <= ALU_A_in - ALU_B_in; -- $SUB
ALU_output <= x"0000000" & b"000" & sign_of_sub;-- $LT
               when b"110" =>
118
119
               when others =>
           end case;
120
121 end process;
122
123
124 ALU_out <= ALU_output;
125
126
127 end Behavioral;
128
```

```
1 --
 2 -- dual_port_memory no CK for read for HW3
 3 --
 4 -- Created:
 5 --
             by - Danny Seidner, 31/8/2013
6 --
 7 --
8
9 LIBRARY ieee;
10 USE ieee.std_logic_1164.all;
11 USE ieee.std_logic_arith.all;
12
13 ENTITY dual_port_memory_no_CK_read IS
14 GENERIC(
15
      width : integer :=32;
16
      depth : integer :=32
17
    );
18 PORT (
                  : in integer range depth-1 downto 0;
19
    wr address
20
    wr_data
                  : in std logic vector(width-1 downto 0);
21 wr_clk
                  : in std_logic;
                  : in std_logic;
22 wr_en
                 : in integer range depth-1 downto 0;
23
    rd1_address
24
                  : out std_logic_vector(width-1 downto 0);
    rd1 data
25
    rd2 address
                 : in integer range depth-1 downto 0;
26
    rd2 data
                  : out std_logic_vector(width-1 downto 0)
27
     );
28 END ENTITY dual_port_memory_no_CK_read;
29
30 --
31 ARCHITECTURE dual_port_memory OF dual_port_memory_no_CK_read IS
32 type Memory_Type is array ((depth-1) downto 0) of std_logic_vector((width-1) downto 0);
33 shared variable Memory_array : Memory_Type := (others => (others => '0')); -- reset initial value to be 0
34
35
36 BEGIN
37
38
39 Memory_wrdata: PROCESS (wr_clk)
40 begin
41 if wr_clk'event and wr_clk = '1' then
     if wr_en = '1' then
42
43
         Memory_array(wr_address) := wr_data;
      end if;
44
45 end if ;
46 end process Memory_wrdata;
47
48
49 Memory_rddata1 : PROCESS (rd1_address,wr_clk) -- need to add wr_clk, otherwise
                                                 -- if we leave rd1_address constant,
50
                                                 -- we won't see changes in rd data even
51
                                                 -- we write new data (in simulation)
52
53 begin
     rd1_data <= Memory_array(rd1_address);</pre>
54
55 end process Memory_rddata1;
56
57
58
59 Memory_rddata2 : PROCESS (rd2_address,wr_clk) -- need to add wr_clk, see Memory_rddata1 above
60 begin
     rd2_data <= Memory_array(rd2_address);
62 end process Memory_rddata2;
63
64
65
66 END ARCHITECTURE dual_port_memory;
67
68
```